Hi,

On 8/6/21 9:34 AM, Guillaume GRANJON DE LEPINEY wrote:
Thank you for taking the time to answer all my questions.          

 

I've already learned a few things from reading the email that I’m going to apply.

However, during my tests I had the impression that the messages were dropping when it didn't have the key.

Yeah, it depends on the other fields' values. If they are identical, the events will get dropped. As the message-algorithm just ignores non-existing fields.

Sebastian

I'll look into the issue when I'll have more time in the coming weeks.

I will not hesitate to contact you again.

 

Thanks,

 

Guillaume GRANJON de LÉPINEY | ggranjon@excellium-services.be | PGP Key ID: 0xE2FD5ED1
CERT-XLM Incident Handler @
excellium-services.com
CERT-XLM |
cert@excellium-services.com | PGP Key ID: 0xD74E5AC0
Excellium Services Belgium N.V. | Orion Bldg, Belgicastraat 13, B-1930 Zaventem, Belgium
Mobile:
+32 4 71 98 57 65
Emergency: +352 262 039 64 708 |
emergency@excellium-services.com | PGP Key ID: 0x42662EFE

 

From: Sebastian Wagner <wagner@cert.at>
Sent: vendredi 30 juillet 2021 09:42
To: Guillaume GRANJON DE LEPINEY <ggranjon@excellium-services.be>; 'intelmq-users@lists.cert.at' <intelmq-users@lists.cert.at>
Subject: Re: [IntelMQ-users] [IntelMQ] Deduplication on an optional field

 

Hi,

On 7/26/21 3:04 PM, Guillaume GRANJON DE LEPINEY wrote:

I wonder if there is a simple way to use a Deduplicator bot on an optional field. Indeed, I noticed when I apply the deduplicator on an optional field that the null value must be entered in the redis because all messages (except the first one) that do not contain the field are dropped.

Is there a workaround please?

 

I could work around this problem by adding two Sieve bots at the exit of the precedent bot that would jump the Deduplicator bot if the message doesn't have the field, but I don't find that to be optimal. Thus, I am open to any proposal that could help me.

The message-hash method ignores any non-existing key: https://github.com/certtools/intelmq/blob/8a8107ec6b332e710626d056b2b0446ab976775f/intelmq/lib/message.py#L404-L405

if filter_type == "whitelist" and key not in filter_keys:

                continue

You could either filter these messages out just before the deduplicator, but I don't see a reason for two sieve bots, one should be sufficient, plus using paths (see https://intelmq.readthedocs.io/en/latest/user/bots.html#sieve).

(btw: If someone tackles https://github.com/certtools/intelmq/issues/1250, the simpler filter expert would also work)

If that's not viable for you, then you'd need to adapt the deduplicator's code a bit, probably also introducing additional parameters. Using the Message.set_default_value is not possible either, as that would set a constant, leading to the same behavior as you have now.

I hope that helps a bit

Sebastian

-- 
// Sebastian Wagner <wagner@cert.at> - T: +43 676 898 298 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg
This email is confidential and may contain legally privileged information. If you are not the intended recipient, you should not copy, distribute, disclose or use the information it contains, please e-mail the sender immediately and delete this message from your system. Note: e-mails are susceptible to corruption, interception and unauthorised amendment; we do not accept liability for any such changes, or for their consequences. You should be aware that we may monitor your e-mails and their content. Excellium Services SA.
-- 
// Sebastian Wagner <wagner@cert.at> - T: +43 676 898 298 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg