[IntelMQ-dev] Question on a few feeds' field mappings in Shadowserver parser

Mika Silander mika.silander at csc.fi
Thu Nov 14 09:45:30 CET 2024


Hi,

Ok, thanks. I'll consider that.

Br, Mika

----- Original Message -----
From: "elsif" <elsif at shadowserver.org>
To: "Mika Silander" <mika.silander at csc.fi>
Sent: Tuesday, 12 November, 2024 17:03:15
Subject: Re: [IntelMQ-dev] Question on a few feeds' field mappings in Shadowserver parser

Hello,

Please create an issue 
(https://github.com/certtools/intelmq/issues/new/choose) for your 
proposed changes on how the parser should operate as this would be a 
departure from how it has historically functioned.

The IPv6 scan6_ntp and scan6_ssl_poodle feeds use the more specific 
extra.source.naics (10449) and extra.source.sector (10459) fields while 
the IPv4 scan_ntp and scan_ssl_poodle feeds use extra.naics (21262) and 
extra.sector (21272) for backwards compatibility.

Regards,

Jason

On 11/12/24 1:07 AM, Mika Silander wrote
> Hi Jason & all,
>
>   Thanks for the clarification. I had taken intelmq.json for a spec but, based on what you tell, it's more like a memorandum of intent. The more undocumented special cases that lurk between the lines in intelmq.json, the less useful it becomes. My strong recommendation here would be, please, document all feed input fields with their corresponding output fields. This should be a one-to-one mapping and be respected by the Shadowserver parser bot. No other fields should magically be added by the parser. Otherwise, the only way to check for event coherence is through event sampling - something I'd dearly like to avoid.
>
>   What comes to the cases I brought up for discussion, imho there are two ways of solving them to make intelmq.json a true spec. Document both possible output field names as optional, or, better, merge the two possible field names into one and improve the parser's converter function to handle both cases.
>
>   Following the latter option, the example of "http_url" could be output by the parser in a single field named "extra.partial_or_complete_url" (getting thus rid of the lottery between "extra.http_url" and "destination.url") and making the converter function in the parser accept both partial and complete URLs. In other words, we'd have something like the following for Sinkhole Events HTTP and friends:
>
>             [
>                "extra.",
>                "partial_or_complete_url",
>                "convert_to_possibly_incomplete_url"
>             ],
>
>   For Sandbox URL, we could stick to "extra.user_agent" without exceptions:
>
>             [
>                "extra.",
>                "user_agent",
>                "validate_to_none"
>             ],
>   
>   This morning's harvest from our checker is:
>
>      Shadowserver/NTP-Version: extra.source.naics, extra.source.sector
>      Shadowserver/SSL-POODLE-Vulnerable-Servers IPv4: extra.source.naics, extra.source.sector
>
>   but there is no trace of these fields in intelmq.json :-(. Maybe you could include them as well in the next round of schema updates?
>
>   And the above was only intended for improving things, not a criticism. I, like the rest of the security community, appreciate and value the work of Shadowserver.
>
> Br, Mika
>
> ----- Original Message -----
> From: "elsif" <elsif at shadowserver.org>
> To: "intelmq-dev" <intelmq-dev at lists.cert.at>
> Sent: Monday, 11 November, 2024 18:08:06
> Subject: Re: [IntelMQ-dev] Question on a few feeds' field mappings in Shadowserver parser
>
> Hello,
>
> I will update the user_agent in the Sandbox URL mapping to be more
> explicit in the next schema update.
>
> The reason that the http_url is commonly not mapped to destination.url
> for the Sinkhole Events HTTP reports is that the value is not a fully
> qualified URL such as "/index.php" without any other context which fails
> the validation for the type "URL" as specified in the harmonization
> configuration.  When the value fails validation it is added to extra
> instead.
>
> Regards,
>
> Jason
>
> On 11/10/24 11:06 PM, Mika Silander via IntelMQ-dev wrote:
>> Hi,
>>
>>    We discovered a few Shadowserver feeds with field mappings that look somewhat odd. We use the feed mapping file https://github.com/The-Shadowserver-Foundation/report_schema/blob/main/intelmq.json and our Shadowserver parser bot is configured to update its own copy of this file dynamically. Our intelmq and its Shadowserver parser bot is the one from intelmq git repo branch release-3.3.1.
>>
>>    In the intelmq.json file mentioned above, the Sandbox URL feed defines the optional input field "user_agent" to be parsed on output to "user_agent" (right?):
>>
>>            [
>>               "user_agent",
>>               "user_agent",
>>               "validate_to_none"
>>            ],
>>
>>    However, the parser bot appears to output "extra.user_agent" instead.
>>
>>    The other mapping that seemed odd was in Sinkhole Events HTTP IPv4 & IPv6 (and in Microsoft Sinkhole Events HTTP IPv4):
>>
>>           [
>>               "destination.url",
>>               "http_url",
>>               "convert_http_host_and_url",
>>               true
>>            ],
>>
>>    I interpret here that the optional input field "http_url" should be mapped by the Shadowserver parser bot to "destination.url" on output, but we seem to get it mapped to "extra.http_url" instead.
>>
>>    Is this a hickup in intelmq.json, the parser or have I (again) missed something? Anyone else seeing this?
>>
>> Best regards, Mika
>> _______________________________________________
>> IntelMQ-dev mailing list
>> https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://docs.intelmq.org/
> _______________________________________________
> IntelMQ-dev mailing list
> https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://docs.intelmq.org/


More information about the IntelMQ-dev mailing list