[IntelMQ-users] [IntelMQ] Deduplication on an optional field

Sebastian Wagner wagner at cert.at
Fri Jul 30 09:42:00 CEST 2021


Hi,

On 7/26/21 3:04 PM, Guillaume GRANJON DE LEPINEY wrote:
> I wonder if there is a simple way to use a Deduplicator bot on an
> optional field. Indeed, I noticed when I apply the deduplicator on an
> optional field that the null value must be entered in the redis
> because all messages (except the first one) that do not contain the
> field are dropped.
>
> Is there a workaround please?
>
>  
>
> I could work around this problem by adding two Sieve bots at the exit
> of the precedent bot that would jump the Deduplicator bot if the
> message doesn't have the field, but I don't find that to be optimal.
> Thus, I am open to any proposal that could help me.
>
The message-hash method ignores any non-existing key:
https://github.com/certtools/intelmq/blob/8a8107ec6b332e710626d056b2b0446ab976775f/intelmq/lib/message.py#L404-L405

iffilter_type == "whitelist"andkey notinfilter_keys:
continue

You could either filter these messages out just before the deduplicator,
but I don't see a reason for /two/ sieve bots, one should be sufficient,
plus using paths (see
https://intelmq.readthedocs.io/en/latest/user/bots.html#sieve).

(btw: If someone tackles
https://github.com/certtools/intelmq/issues/1250, the simpler filter
expert would also work)

If that's not viable for you, then you'd need to adapt the
deduplicator's code a bit, probably also introducing additional
parameters. Using the Message.set_default_value is not possible either,
as that would set a constant, leading to the same behavior as you have now.

I hope that helps a bit

Sebastian

-- 
// Sebastian Wagner <wagner at cert.at> - T: +43 676 898 298 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cert.at/pipermail/intelmq-users/attachments/20210730/03a6b311/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.cert.at/pipermail/intelmq-users/attachments/20210730/03a6b311/attachment.sig>


More information about the IntelMQ-users mailing list