[IntelMQ-users] [IntelMQ] Deduplication on an optional field
Sebastian Wagner
wagner at cert.at
Fri Jul 30 09:42:00 CEST 2021
Hi,
On 7/26/21 3:04 PM, Guillaume GRANJON DE LEPINEY wrote:
> I wonder if there is a simple way to use a Deduplicator bot on an
> optional field. Indeed, I noticed when I apply the deduplicator on an
> optional field that the null value must be entered in the redis
> because all messages (except the first one) that do not contain the
> field are dropped.
>
> Is there a workaround please?
>
>
>
> I could work around this problem by adding two Sieve bots at the exit
> of the precedent bot that would jump the Deduplicator bot if the
> message doesn't have the field, but I don't find that to be optimal.
> Thus, I am open to any proposal that could help me.
>
The message-hash method ignores any non-existing key:
https://github.com/certtools/intelmq/blob/8a8107ec6b332e710626d056b2b0446ab976775f/intelmq/lib/message.py#L404-L405
iffilter_type == "whitelist"andkey notinfilter_keys:
continue
You could either filter these messages out just before the deduplicator,
but I don't see a reason for /two/ sieve bots, one should be sufficient,
plus using paths (see
https://intelmq.readthedocs.io/en/latest/user/bots.html#sieve).
(btw: If someone tackles
https://github.com/certtools/intelmq/issues/1250, the simpler filter
expert would also work)
If that's not viable for you, then you'd need to adapt the
deduplicator's code a bit, probably also introducing additional
parameters. Using the Message.set_default_value is not possible either,
as that would set a constant, leading to the same behavior as you have now.
I hope that helps a bit
Sebastian
--
// Sebastian Wagner <wagner at cert.at> - T: +43 676 898 298 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cert.at/pipermail/intelmq-users/attachments/20210730/03a6b311/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.cert.at/pipermail/intelmq-users/attachments/20210730/03a6b311/attachment.sig>
More information about the IntelMQ-users
mailing list