Hi,
Ok, thanks. I'll consider that.
Br, Mika
----- Original Message ----- From: "elsif" elsif@shadowserver.org To: "Mika Silander" mika.silander@csc.fi Sent: Tuesday, 12 November, 2024 17:03:15 Subject: Re: [IntelMQ-dev] Question on a few feeds' field mappings in Shadowserver parser
Hello,
Please create an issue (https://github.com/certtools/intelmq/issues/new/choose) for your proposed changes on how the parser should operate as this would be a departure from how it has historically functioned.
The IPv6 scan6_ntp and scan6_ssl_poodle feeds use the more specific extra.source.naics (10449) and extra.source.sector (10459) fields while the IPv4 scan_ntp and scan_ssl_poodle feeds use extra.naics (21262) and extra.sector (21272) for backwards compatibility.
Regards,
Jason
On 11/12/24 1:07 AM, Mika Silander wrote
Hi Jason & all,
Thanks for the clarification. I had taken intelmq.json for a spec but, based on what you tell, it's more like a memorandum of intent. The more undocumented special cases that lurk between the lines in intelmq.json, the less useful it becomes. My strong recommendation here would be, please, document all feed input fields with their corresponding output fields. This should be a one-to-one mapping and be respected by the Shadowserver parser bot. No other fields should magically be added by the parser. Otherwise, the only way to check for event coherence is through event sampling - something I'd dearly like to avoid.
What comes to the cases I brought up for discussion, imho there are two ways of solving them to make intelmq.json a true spec. Document both possible output field names as optional, or, better, merge the two possible field names into one and improve the parser's converter function to handle both cases.
Following the latter option, the example of "http_url" could be output by the parser in a single field named "extra.partial_or_complete_url" (getting thus rid of the lottery between "extra.http_url" and "destination.url") and making the converter function in the parser accept both partial and complete URLs. In other words, we'd have something like the following for Sinkhole Events HTTP and friends:
[ "extra.", "partial_or_complete_url", "convert_to_possibly_incomplete_url" ],
For Sandbox URL, we could stick to "extra.user_agent" without exceptions:
[ "extra.", "user_agent", "validate_to_none" ],
This morning's harvest from our checker is:
Shadowserver/NTP-Version: extra.source.naics, extra.source.sector Shadowserver/SSL-POODLE-Vulnerable-Servers IPv4: extra.source.naics, extra.source.sector
but there is no trace of these fields in intelmq.json :-(. Maybe you could include them as well in the next round of schema updates?
And the above was only intended for improving things, not a criticism. I, like the rest of the security community, appreciate and value the work of Shadowserver.
Br, Mika
----- Original Message ----- From: "elsif" elsif@shadowserver.org To: "intelmq-dev" intelmq-dev@lists.cert.at Sent: Monday, 11 November, 2024 18:08:06 Subject: Re: [IntelMQ-dev] Question on a few feeds' field mappings in Shadowserver parser
Hello,
I will update the user_agent in the Sandbox URL mapping to be more explicit in the next schema update.
The reason that the http_url is commonly not mapped to destination.url for the Sinkhole Events HTTP reports is that the value is not a fully qualified URL such as "/index.php" without any other context which fails the validation for the type "URL" as specified in the harmonization configuration. When the value fails validation it is added to extra instead.
Regards,
Jason
On 11/10/24 11:06 PM, Mika Silander via IntelMQ-dev wrote:
Hi,
We discovered a few Shadowserver feeds with field mappings that look somewhat odd. We use the feed mapping file https://github.com/The-Shadowserver-Foundation/report_schema/blob/main/intel... and our Shadowserver parser bot is configured to update its own copy of this file dynamically. Our intelmq and its Shadowserver parser bot is the one from intelmq git repo branch release-3.3.1.
In the intelmq.json file mentioned above, the Sandbox URL feed defines the optional input field "user_agent" to be parsed on output to "user_agent" (right?):
[ "user_agent", "user_agent", "validate_to_none" ],
However, the parser bot appears to output "extra.user_agent" instead.
The other mapping that seemed odd was in Sinkhole Events HTTP IPv4 & IPv6 (and in Microsoft Sinkhole Events HTTP IPv4):
[ "destination.url", "http_url", "convert_http_host_and_url", true ],
I interpret here that the optional input field "http_url" should be mapped by the Shadowserver parser bot to "destination.url" on output, but we seem to get it mapped to "extra.http_url" instead.
Is this a hickup in intelmq.json, the parser or have I (again) missed something? Anyone else seeing this?
Best regards, Mika _______________________________________________ IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://docs.intelmq.org/
IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://docs.intelmq.org/