Hi,
We discovered a few Shadowserver feeds with field mappings that look somewhat odd. We use the feed mapping file https://github.com/The-Shadowserver-Foundation/report_schema/blob/main/intel... and our Shadowserver parser bot is configured to update its own copy of this file dynamically. Our intelmq and its Shadowserver parser bot is the one from intelmq git repo branch release-3.3.1.
In the intelmq.json file mentioned above, the Sandbox URL feed defines the optional input field "user_agent" to be parsed on output to "user_agent" (right?):
[ "user_agent", "user_agent", "validate_to_none" ],
However, the parser bot appears to output "extra.user_agent" instead.
The other mapping that seemed odd was in Sinkhole Events HTTP IPv4 & IPv6 (and in Microsoft Sinkhole Events HTTP IPv4):
[ "destination.url", "http_url", "convert_http_host_and_url", true ],
I interpret here that the optional input field "http_url" should be mapped by the Shadowserver parser bot to "destination.url" on output, but we seem to get it mapped to "extra.http_url" instead.
Is this a hickup in intelmq.json, the parser or have I (again) missed something? Anyone else seeing this?
Best regards, Mika
Hi Mika
On 11/11/24 8:06 AM, Mika Silander via IntelMQ-dev wrote:
In the intelmq.json file mentioned above, the Sandbox URL feed defines the optional input field "user_agent" to be parsed on output to "user_agent" (right?):
[ "user_agent", "user_agent", "validate_to_none" ],
However, the parser bot appears to output "extra.user_agent" instead.
Yes, because "user_agent" is used as shortcut for "extra.user_agent", because the field "user_agent" does not exist in IntelMQ. This behavior is specific to the Shadowserver-Parser, not a default in IntelMQ.
https://github.com/certtools/intelmq/blob/e86912f6740ea1592f531fbaa9713e1f60...
However, I think "explicit is better than implicit" and the behavior does not bring any advantages, only potential confusion, as in this case.
The other mapping that seemed odd was in Sinkhole Events HTTP IPv4 & IPv6 (and in Microsoft Sinkhole Events HTTP IPv4):
[ "destination.url", "http_url", "convert_http_host_and_url", true ],
I interpret here that the optional input field "http_url" should be mapped by the Shadowserver parser bot to "destination.url" on output, but we seem to get it mapped to "extra.http_url" instead.
That's also how I'd interpret it, but don't have any more insights (and data/examples) here.
Best regards
Sebastian
Hi Sebastian,
Thanks for the clarification. Yes, I would also prefer explicit behaviour rather than implicit. In addition, this implicitness makes it hard to use the intelmq.json description of the feeds as the base line for checking whether the events reported truly contain the expected fields and nothing else. I'd guess it also indirectly hampers the adoption of intelmq. Maybe someone else can clarify what benefits this implicitness brings.
Br, Mika
----- Original Message ----- From: "Sebix" sebix@sebix.at To: "Mika Silander" mika.silander@csc.fi, "intelmq-dev" intelmq-dev@lists.cert.at Sent: Monday, 11 November, 2024 12:09:55 Subject: Re: [IntelMQ-dev] Question on a few feeds' field mappings in Shadowserver parser
Hi Mika
On 11/11/24 8:06 AM, Mika Silander via IntelMQ-dev wrote:
In the intelmq.json file mentioned above, the Sandbox URL feed defines the optional input field "user_agent" to be parsed on output to "user_agent" (right?):
[ "user_agent", "user_agent", "validate_to_none" ],
However, the parser bot appears to output "extra.user_agent" instead.
Yes, because "user_agent" is used as shortcut for "extra.user_agent", because the field "user_agent" does not exist in IntelMQ. This behavior is specific to the Shadowserver-Parser, not a default in IntelMQ.
https://github.com/certtools/intelmq/blob/e86912f6740ea1592f531fbaa9713e1f60...
However, I think "explicit is better than implicit" and the behavior does not bring any advantages, only potential confusion, as in this case.
The other mapping that seemed odd was in Sinkhole Events HTTP IPv4 & IPv6 (and in Microsoft Sinkhole Events HTTP IPv4):
[ "destination.url", "http_url", "convert_http_host_and_url", true ],
I interpret here that the optional input field "http_url" should be mapped by the Shadowserver parser bot to "destination.url" on output, but we seem to get it mapped to "extra.http_url" instead.
That's also how I'd interpret it, but don't have any more insights (and data/examples) here.
Best regards
Sebastian
Hello,
I will update the user_agent in the Sandbox URL mapping to be more explicit in the next schema update.
The reason that the http_url is commonly not mapped to destination.url for the Sinkhole Events HTTP reports is that the value is not a fully qualified URL such as "/index.php" without any other context which fails the validation for the type "URL" as specified in the harmonization configuration. When the value fails validation it is added to extra instead.
Regards,
Jason
On 11/10/24 11:06 PM, Mika Silander via IntelMQ-dev wrote:
Hi,
We discovered a few Shadowserver feeds with field mappings that look somewhat odd. We use the feed mapping file https://github.com/The-Shadowserver-Foundation/report_schema/blob/main/intel... and our Shadowserver parser bot is configured to update its own copy of this file dynamically. Our intelmq and its Shadowserver parser bot is the one from intelmq git repo branch release-3.3.1.
In the intelmq.json file mentioned above, the Sandbox URL feed defines the optional input field "user_agent" to be parsed on output to "user_agent" (right?):
[ "user_agent", "user_agent", "validate_to_none" ],
However, the parser bot appears to output "extra.user_agent" instead.
The other mapping that seemed odd was in Sinkhole Events HTTP IPv4 & IPv6 (and in Microsoft Sinkhole Events HTTP IPv4):
[ "destination.url", "http_url", "convert_http_host_and_url", true ],
I interpret here that the optional input field "http_url" should be mapped by the Shadowserver parser bot to "destination.url" on output, but we seem to get it mapped to "extra.http_url" instead.
Is this a hickup in intelmq.json, the parser or have I (again) missed something? Anyone else seeing this?
Best regards, Mika _______________________________________________ IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://docs.intelmq.org/
Hi Jason & all,
Thanks for the clarification. I had taken intelmq.json for a spec but, based on what you tell, it's more like a memorandum of intent. The more undocumented special cases that lurk between the lines in intelmq.json, the less useful it becomes. My strong recommendation here would be, please, document all feed input fields with their corresponding output fields. This should be a one-to-one mapping and be respected by the Shadowserver parser bot. No other fields should magically be added by the parser. Otherwise, the only way to check for event coherence is through event sampling - something I'd dearly like to avoid.
What comes to the cases I brought up for discussion, imho there are two ways of solving them to make intelmq.json a true spec. Document both possible output field names as optional, or, better, merge the two possible field names into one and improve the parser's converter function to handle both cases.
Following the latter option, the example of "http_url" could be output by the parser in a single field named "extra.partial_or_complete_url" (getting thus rid of the lottery between "extra.http_url" and "destination.url") and making the converter function in the parser accept both partial and complete URLs. In other words, we'd have something like the following for Sinkhole Events HTTP and friends:
[ "extra.", "partial_or_complete_url", "convert_to_possibly_incomplete_url" ],
For Sandbox URL, we could stick to "extra.user_agent" without exceptions:
[ "extra.", "user_agent", "validate_to_none" ],
This morning's harvest from our checker is:
Shadowserver/NTP-Version: extra.source.naics, extra.source.sector Shadowserver/SSL-POODLE-Vulnerable-Servers IPv4: extra.source.naics, extra.source.sector
but there is no trace of these fields in intelmq.json :-(. Maybe you could include them as well in the next round of schema updates?
And the above was only intended for improving things, not a criticism. I, like the rest of the security community, appreciate and value the work of Shadowserver.
Br, Mika
----- Original Message ----- From: "elsif" elsif@shadowserver.org To: "intelmq-dev" intelmq-dev@lists.cert.at Sent: Monday, 11 November, 2024 18:08:06 Subject: Re: [IntelMQ-dev] Question on a few feeds' field mappings in Shadowserver parser
Hello,
I will update the user_agent in the Sandbox URL mapping to be more explicit in the next schema update.
The reason that the http_url is commonly not mapped to destination.url for the Sinkhole Events HTTP reports is that the value is not a fully qualified URL such as "/index.php" without any other context which fails the validation for the type "URL" as specified in the harmonization configuration. When the value fails validation it is added to extra instead.
Regards,
Jason
On 11/10/24 11:06 PM, Mika Silander via IntelMQ-dev wrote:
Hi,
We discovered a few Shadowserver feeds with field mappings that look somewhat odd. We use the feed mapping file https://github.com/The-Shadowserver-Foundation/report_schema/blob/main/intel... and our Shadowserver parser bot is configured to update its own copy of this file dynamically. Our intelmq and its Shadowserver parser bot is the one from intelmq git repo branch release-3.3.1.
In the intelmq.json file mentioned above, the Sandbox URL feed defines the optional input field "user_agent" to be parsed on output to "user_agent" (right?):
[ "user_agent", "user_agent", "validate_to_none" ],
However, the parser bot appears to output "extra.user_agent" instead.
The other mapping that seemed odd was in Sinkhole Events HTTP IPv4 & IPv6 (and in Microsoft Sinkhole Events HTTP IPv4):
[ "destination.url", "http_url", "convert_http_host_and_url", true ],
I interpret here that the optional input field "http_url" should be mapped by the Shadowserver parser bot to "destination.url" on output, but we seem to get it mapped to "extra.http_url" instead.
Is this a hickup in intelmq.json, the parser or have I (again) missed something? Anyone else seeing this?
Best regards, Mika _______________________________________________ IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://docs.intelmq.org/
_______________________________________________ IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://docs.intelmq.org/