[IntelMQ-users] [IntelMQ] Deduplication on an optional field

Guillaume GRANJON DE LEPINEY ggranjon at excellium-services.be
Fri Aug 6 09:34:35 CEST 2021


Hi,

Thank you for taking the time to answer all my questions.

I've already learned a few things from reading the email that I'm going to apply.
However, during my tests I had the impression that the messages were dropping when it didn't have the key. I'll look into the issue when I'll have more time in the coming weeks.
I will not hesitate to contact you again.

Thanks,

Guillaume GRANJON de LÉPINEY | ggranjon at excellium-services.be<mailto:ggranjon at excellium-services.be> | PGP Key ID: 0xE2FD5ED1<https://pgp.circl.lu/pks/lookup?search=0xE2FD5ED1&fingerprint=on&op=index>
CERT-XLM Incident Handler @ excellium-services.com<https://excellium-services.com/>
CERT-XLM | cert at excellium-services.com<mailto:cert at excellium-services.com> | PGP Key ID: 0xD74E5AC0<http://pgp.circl.lu/pks/lookup?op=vindex&fingerprint=on&search=0x67B311E5D74E5AC0>
Excellium Services Belgium N.V. | Orion Bldg, Belgicastraat 13, B-1930 Zaventem, Belgium
Mobile: +32 4 71 98 57 65
Emergency: +352 262 039 64 708 | emergency at excellium-services.com<mailto:emergency at excellium-services.com> | PGP Key ID: 0x42662EFE<https://excellium-services.com/assets/EMERGENCY_PKEY.asc>

From: Sebastian Wagner <wagner at cert.at>
Sent: vendredi 30 juillet 2021 09:42
To: Guillaume GRANJON DE LEPINEY <ggranjon at excellium-services.be>; 'intelmq-users at lists.cert.at' <intelmq-users at lists.cert.at>
Subject: Re: [IntelMQ-users] [IntelMQ] Deduplication on an optional field


Hi,
On 7/26/21 3:04 PM, Guillaume GRANJON DE LEPINEY wrote:
I wonder if there is a simple way to use a Deduplicator bot on an optional field. Indeed, I noticed when I apply the deduplicator on an optional field that the null value must be entered in the redis because all messages (except the first one) that do not contain the field are dropped.
Is there a workaround please?

I could work around this problem by adding two Sieve bots at the exit of the precedent bot that would jump the Deduplicator bot if the message doesn't have the field, but I don't find that to be optimal. Thus, I am open to any proposal that could help me.

The message-hash method ignores any non-existing key: https://github.com/certtools/intelmq/blob/8a8107ec6b332e710626d056b2b0446ab976775f/intelmq/lib/message.py#L404-L405
if filter_type == "whitelist" and key not in filter_keys:
                continue

You could either filter these messages out just before the deduplicator, but I don't see a reason for two sieve bots, one should be sufficient, plus using paths (see https://intelmq.readthedocs.io/en/latest/user/bots.html#sieve).

(btw: If someone tackles https://github.com/certtools/intelmq/issues/1250, the simpler filter expert would also work)

If that's not viable for you, then you'd need to adapt the deduplicator's code a bit, probably also introducing additional parameters. Using the Message.set_default_value is not possible either, as that would set a constant, leading to the same behavior as you have now.

I hope that helps a bit

Sebastian

--

// Sebastian Wagner <wagner at cert.at><mailto:wagner at cert.at> - T: +43 676 898 298 7201

// CERT Austria - https://www.cert.at/

// Eine Initiative der nic.at GmbH - https://www.nic.at/

// Firmenbuchnummer 172568b, LG Salzburg

This email is confidential and may contain legally privileged information. If you are not the intended recipient, you should not copy, distribute, disclose or use the information it contains, please e-mail the sender immediately and delete this message from your system. Note: e-mails are susceptible to corruption, interception and unauthorised amendment; we do not accept liability for any such changes, or for their consequences. You should be aware that we may monitor your e-mails and their content. Excellium Services SA.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cert.at/pipermail/intelmq-users/attachments/20210806/f226101e/attachment.htm>


More information about the IntelMQ-users mailing list