<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body>
<p>Hi,<br>
</p>
<div class="moz-cite-prefix">On 7/26/21 3:04 PM, Guillaume GRANJON
DE LEPINEY wrote:<br>
</div>
<blockquote type="cite"
cite="mid:PA4PR10MB454476EA0368EBEC298F7F7D8CE89@PA4PR10MB4544.EURPRD10.PROD.OUTLOOK.COM">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle18
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><span lang="EN-US"><o:p></o:p></span><span
lang="EN-US">I wonder if there is a simple way to use a
Deduplicator bot on an optional field. Indeed, I noticed when I
apply the deduplicator on an optional field that the null value
must be entered in the redis because all messages (except the
first one) that do not contain the field are dropped.<o:p></o:p></span>
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">Is there a workaround
please?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">I could work around this
problem by adding two Sieve bots at the exit of the
precedent bot that would jump the Deduplicator bot if the
message doesn't have the field, but I don't find that to be
optimal. Thus, I am open to any proposal that could help me.</span></p>
</div>
</blockquote>
<p>The message-hash method ignores any non-existing key:
<a class="moz-txt-link-freetext" href="https://github.com/certtools/intelmq/blob/8a8107ec6b332e710626d056b2b0446ab976775f/intelmq/lib/message.py#L404-L405">https://github.com/certtools/intelmq/blob/8a8107ec6b332e710626d056b2b0446ab976775f/intelmq/lib/message.py#L404-L405</a></p>
<div style="color: #d4d4d4;background-color: #1e1e1e;font-family: 'Droid Sans Mono', 'monospace', monospace, 'Droid Sans Fallback';font-weight: normal;font-size: 14px;line-height: 19px;white-space: pre;"><div><span style="color: #c586c0;">if</span><span style="color: #d4d4d4;"> filter_type == </span><span style="color: #ce9178;">"whitelist"</span><span style="color: #d4d4d4;"> </span><span style="color: #569cd6;">and</span><span style="color: #d4d4d4;"> key </span><span style="color: #569cd6;">not</span><span style="color: #d4d4d4;"> </span><span style="color: #569cd6;">in</span><span style="color: #d4d4d4;"> filter_keys:</span></div><div><span style="color: #d4d4d4;"> </span><span style="color: #c586c0;">continue</span></div></div>
<p>You could either filter these messages out just before the
deduplicator, but I don't see a reason for <i>two</i> sieve bots,
one should be sufficient, plus using paths (see
<a class="moz-txt-link-freetext" href="https://intelmq.readthedocs.io/en/latest/user/bots.html#sieve">https://intelmq.readthedocs.io/en/latest/user/bots.html#sieve</a>).</p>
<p>(btw: If someone tackles
<a class="moz-txt-link-freetext" href="https://github.com/certtools/intelmq/issues/1250">https://github.com/certtools/intelmq/issues/1250</a>, the simpler
filter expert would also work)<br>
</p>
<p>If that's not viable for you, then you'd need to adapt the
deduplicator's code a bit, probably also introducing additional
parameters. Using the Message.set_default_value is not possible
either, as that would set a constant, leading to the same behavior
as you have now.<br>
</p>
<p>I hope that helps a bit</p>
<p>Sebastian<br>
</p>
<pre class="moz-signature" cols="72">--
// Sebastian Wagner <a class="moz-txt-link-rfc2396E" href="mailto:wagner@cert.at"><wagner@cert.at></a> - T: +43 676 898 298 7201
// CERT Austria - <a class="moz-txt-link-freetext" href="https://www.cert.at/">https://www.cert.at/</a>
// Eine Initiative der nic.at GmbH - <a class="moz-txt-link-freetext" href="https://www.nic.at/">https://www.nic.at/</a>
// Firmenbuchnummer 172568b, LG Salzburg</pre>
</body>
</html>