Re: [IntelMQ-dev] Shadowserver parser: Bad mapping for malware events

30 Jan 2024

      Hi all,
Thanks for the comments. I've forwarded the thread to ShadowServer, and 
they also have just joined the list (represented by @elsif, who works on 
the IntelMQ integration), so we can discuss the feedback directly.
@Thomas - answering the question about completed schema changes, I spoke 
with elsif about that a few weeks ago, and schema changelog is available 
at 
https://github.com/The-Shadowserver-Foundation/report_schema/blob/main/compl...
Best regards
// Kamil Mańkowski mankowski@cert.at - T: +43 676 898 298 7204
// CERT Austria - https://www.cert.at/
// CERT.at GmbH, FB-Nr. 561772k, HG Wien
On 1/29/24 09:49, Thomas Hungenberg wrote:
...
Hi all,
On 26.01.24 15:30, Sebix wrote:
...
Originally, the intended use of classification.identifier and 
malware.name was:

malware.name contained the original (and unprocessed) malware name.

It was as specific as possible. It can have the malware variant. For 
example, "b157-rL".

The classification.* fields should be usable for aggregation,

de-duplication, statistics etc.

For malware events, the parsers could write the malware family (e.g.

"zeus") or the malware name to the identifier.

The family took precedence, but if not known, the more specific

malware.name could be used instead.

It was always up to the user to replace the identifier with a more

generic malware family, e.g. using the public malware name mapping and 
malpedia.
At least until 2022, IntelMQ and all its parsers fit this concept. It 
may still be the case, given the recent significant changes.
@Sebastian: Thanks for summarizing this well-proven concept!
The changes in the Shadowserver parser config must have happened 
somewhen between January and August 2022.
Most likely with the adoption to the changes in the Shadowserver feeds 
like the move from "botnet drone" to "sinkhole events"?
In Januar 2022, the original (unprocessed) malware name ("infection" or 
"type") was still written to malware.name and "family" to extra.
classification.identifier was left blank and could be set e.g. with a 
malware name mapping modify expert:
==============================
drone = {
     'optional_fields': [
         ('malware.name', 'infection'),
         ('extra.', 'family', validate_to_none),
     ],
     'constant_fields': {
         # classification.identifier will be set to (harmonized) malware 
name by modify expert
     },
==============================
See 
https://github.com/certtools/intelmq/blob/747100f6ee6519a44cd157fe0b6c98f4b3585821/intelmq/bots/parsers/shadowserver/_config.py
This fits the concept mentioned above.
However, in August 2022 "infection" was no longer stored in malware.name 
but used as classification.identifier and malware.name was set to "family":
==============================
event_sinkhole = {
      'optional_fields': [
          ('classification.identifier', 'infection', validate_to_none),
          ('malware.name', 'family', validate_to_none),
==============================
See 
https://github.com/certtools/intelmq/blob/1e4a16c5594e88461f2eccad87d2ea3b62e7c955/intelmq/bots/parsers/shadowserver/_config.py
Unfortunately, this is the opposite of the well-proven concept.
With the changes I proposed last week (2024-01-26), we return to the 
former well-proven concept with storing
"infection" (or "type") in malware.name and "family" in "extra.family" 
like until 2022.
This makes the Shadowserver parser consistent with other parsers for 
malware events (like ctip or anubis) again.
Additionally, we store "infection" (or "type") in 
classification.identifier as well
to make sure every event processed by the parser has a 
classification.identifier.
However, the classification.identifier can later be replaced e.g. with a 
harmonized malware name using the malware name mapping.
Kind regards
Thomas

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

Re: [IntelMQ-dev] Shadowserver parser: Bad mapping for malware events