[IntelMQ-dev] Shadowserver parser: Bad mapping for malware events

Thomas Hungenberg th at cert-bund.de
Mon Jan 29 09:49:08 CET 2024


Hi all,

On 26.01.24 15:30, Sebix wrote:
> Originally, the intended use of classification.identifier and malware.name was:
> - malware.name contained the original (and unprocessed) malware name. It was as specific as possible. It can have the malware variant. For example, 
> "b157-rL".
> - The classification.* fields should be usable for aggregation, de-duplication, statistics etc.
> - For malware events, the parsers could write the malware family (e.g. "zeus") or the malware name to the identifier.
> - The family took precedence, but if not known, the more specific malware.name could be used instead.
> - It was always up to the user to replace the identifier with a more generic malware family, e.g. using the public malware name mapping and malpedia.
> 
> At least until 2022, IntelMQ and all its parsers fit this concept. It may still be the case, given the recent significant changes.

@Sebastian: Thanks for summarizing this well-proven concept!

The changes in the Shadowserver parser config must have happened somewhen between January and August 2022.
Most likely with the adoption to the changes in the Shadowserver feeds like the move from "botnet drone" to "sinkhole events"?

In Januar 2022, the original (unprocessed) malware name ("infection" or "type") was still written to malware.name and "family" to extra.
classification.identifier was left blank and could be set e.g. with a malware name mapping modify expert:

==============================
drone = {
     'optional_fields': [
         ('malware.name', 'infection'),
         ('extra.', 'family', validate_to_none),
     ],
     'constant_fields': {
         # classification.identifier will be set to (harmonized) malware name by modify expert
     },
==============================

See <https://github.com/certtools/intelmq/blob/747100f6ee6519a44cd157fe0b6c98f4b3585821/intelmq/bots/parsers/shadowserver/_config.py>

This fits the concept mentioned above.

However, in August 2022 "infection" was no longer stored in malware.name but used as classification.identifier and malware.name was set to "family":

==============================
event_sinkhole = {
      'optional_fields': [
          ('classification.identifier', 'infection', validate_to_none),
          ('malware.name', 'family', validate_to_none),
==============================

See <https://github.com/certtools/intelmq/blob/1e4a16c5594e88461f2eccad87d2ea3b62e7c955/intelmq/bots/parsers/shadowserver/_config.py>

Unfortunately, this is the opposite of the well-proven concept.


With the changes I proposed last week (2024-01-26), we return to the former well-proven concept with storing
"infection" (or "type") in malware.name and "family" in "extra.family" like until 2022.
This makes the Shadowserver parser consistent with other parsers for malware events (like ctip or anubis) again.

Additionally, we store "infection" (or "type") in classification.identifier as well
to make sure every event processed by the parser has a classification.identifier.
However, the classification.identifier can later be replaced e.g. with a harmonized malware name using the malware name mapping.


Kind regards
Thomas



More information about the IntelMQ-dev mailing list