[IntelMQ-dev] Question on a few feeds' field mappings in Shadowserver parser
Mika Silander
mika.silander at csc.fi
Tue Nov 12 10:07:05 CET 2024
Hi Jason & all,
Thanks for the clarification. I had taken intelmq.json for a spec but, based on what you tell, it's more like a memorandum of intent. The more undocumented special cases that lurk between the lines in intelmq.json, the less useful it becomes. My strong recommendation here would be, please, document all feed input fields with their corresponding output fields. This should be a one-to-one mapping and be respected by the Shadowserver parser bot. No other fields should magically be added by the parser. Otherwise, the only way to check for event coherence is through event sampling - something I'd dearly like to avoid.
What comes to the cases I brought up for discussion, imho there are two ways of solving them to make intelmq.json a true spec. Document both possible output field names as optional, or, better, merge the two possible field names into one and improve the parser's converter function to handle both cases.
Following the latter option, the example of "http_url" could be output by the parser in a single field named "extra.partial_or_complete_url" (getting thus rid of the lottery between "extra.http_url" and "destination.url") and making the converter function in the parser accept both partial and complete URLs. In other words, we'd have something like the following for Sinkhole Events HTTP and friends:
[
"extra.",
"partial_or_complete_url",
"convert_to_possibly_incomplete_url"
],
For Sandbox URL, we could stick to "extra.user_agent" without exceptions:
[
"extra.",
"user_agent",
"validate_to_none"
],
This morning's harvest from our checker is:
Shadowserver/NTP-Version: extra.source.naics, extra.source.sector
Shadowserver/SSL-POODLE-Vulnerable-Servers IPv4: extra.source.naics, extra.source.sector
but there is no trace of these fields in intelmq.json :-(. Maybe you could include them as well in the next round of schema updates?
And the above was only intended for improving things, not a criticism. I, like the rest of the security community, appreciate and value the work of Shadowserver.
Br, Mika
----- Original Message -----
From: "elsif" <elsif at shadowserver.org>
To: "intelmq-dev" <intelmq-dev at lists.cert.at>
Sent: Monday, 11 November, 2024 18:08:06
Subject: Re: [IntelMQ-dev] Question on a few feeds' field mappings in Shadowserver parser
Hello,
I will update the user_agent in the Sandbox URL mapping to be more
explicit in the next schema update.
The reason that the http_url is commonly not mapped to destination.url
for the Sinkhole Events HTTP reports is that the value is not a fully
qualified URL such as "/index.php" without any other context which fails
the validation for the type "URL" as specified in the harmonization
configuration. When the value fails validation it is added to extra
instead.
Regards,
Jason
On 11/10/24 11:06 PM, Mika Silander via IntelMQ-dev wrote:
> Hi,
>
> We discovered a few Shadowserver feeds with field mappings that look somewhat odd. We use the feed mapping file https://github.com/The-Shadowserver-Foundation/report_schema/blob/main/intelmq.json and our Shadowserver parser bot is configured to update its own copy of this file dynamically. Our intelmq and its Shadowserver parser bot is the one from intelmq git repo branch release-3.3.1.
>
> In the intelmq.json file mentioned above, the Sandbox URL feed defines the optional input field "user_agent" to be parsed on output to "user_agent" (right?):
>
> [
> "user_agent",
> "user_agent",
> "validate_to_none"
> ],
>
> However, the parser bot appears to output "extra.user_agent" instead.
>
> The other mapping that seemed odd was in Sinkhole Events HTTP IPv4 & IPv6 (and in Microsoft Sinkhole Events HTTP IPv4):
>
> [
> "destination.url",
> "http_url",
> "convert_http_host_and_url",
> true
> ],
>
> I interpret here that the optional input field "http_url" should be mapped by the Shadowserver parser bot to "destination.url" on output, but we seem to get it mapped to "extra.http_url" instead.
>
> Is this a hickup in intelmq.json, the parser or have I (again) missed something? Anyone else seeing this?
>
> Best regards, Mika
> _______________________________________________
> IntelMQ-dev mailing list
> https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://docs.intelmq.org/
_______________________________________________
IntelMQ-dev mailing list
https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://docs.intelmq.org/
More information about the IntelMQ-dev
mailing list