[IntelMQ-dev] Shadowserver parser: Bad mapping for malware events

Thomas Hungenberg th at cert-bund.de
Wed Jan 24 10:28:55 CET 2024


Hi all,

the parsers for malware events provided by different sources usually store
the malware name in malware.name and classification.identifier is left blank
(or set to the feed's name).
When using the malware name mapping, a harmonized malware name is subsequently
written to classification.identifier. So finally you have the original name
in malware.name and the harmonized name in classification.identifier.

Formerly (in the version we initially provided), the Shadowserver parser
also stored the malware name in malware.name, see e.g.
<https://github.com/certtools/intelmq/blob/c61ff2fd4232d6937f3815377b75f682a6fcf790/intelmq/bots/parsers/shadowserver/_config.py>
line 387

However, for some time the Shadowserver parser now writes the malware name
("infection") to classification.identifier and "family" to malware.name instead.
This is bad for several reasons:
- it is not consistent with parsers for other malware feeds
- it breaks deduplicators matching on malware.name
- the malware name mapping overwrites classification.identifier with the
   value of "family" (which often is empty)


Here is a patch (for the version included with IntelMQ 3.2.1) to fix this problem
and make malware events parsed by the Shadowserver parser consistent with other
parsers again:

===============================================
diff --git a/_config.py.orig b/_config.py
index bea3d0c..431bcb9 100644
--- a/_config.py.orig
+++ b/_config.py
@@ -867,10 +867,9 @@ event_sinkhole = {
          ('source.port', 'src_port', convert_int),
      ],
      'optional_fields': [
-        ('classification.identifier', 'infection', validate_to_none),
-        ('malware.name', 'family', validate_to_none),
+        ('malware.name', 'infection', validate_to_none),
+        ('extra.', 'family', validate_to_none),
          ('extra.', 'tag', validate_to_none),
-        ('extra.', 'infection', validate_to_none),
          ('protocol.transport', 'protocol'),
          ('source.asn', 'src_asn', invalidate_zero),
          ('source.geolocation.cc', 'src_geo'),
@@ -899,6 +898,7 @@ event_sinkhole = {
      'constant_fields': {
          'classification.taxonomy': 'malicious-code',
          'classification.type': 'infected-system',
+        'classification.identifier': 'sinkhole-events',
      },
  }

@@ -944,10 +944,9 @@ event_sinkhole_http = {
          ('source.port', 'src_port', convert_int),
      ],
      'optional_fields': [
-        ('classification.identifier', 'tag'),
-        ('malware.name', 'family', validate_to_none),
+        ('malware.name', 'infection', validate_to_none),
+        ('extra.', 'family', validate_to_none),
          ('extra.', 'tag', validate_to_none),
-        ('extra.', 'infection', validate_to_none),
          ('protocol.transport', 'protocol'),
          ('source.asn', 'src_asn', invalidate_zero),
          ('source.geolocation.cc', 'src_geo'),
@@ -982,6 +981,7 @@ event_sinkhole_http = {
      'constant_fields': {
          'classification.taxonomy': 'malicious-code',
          'classification.type': 'infected-system',
+        'classification.identifier': 'sinkhole-http-events',
          'protocol.application': 'http',
      },
  }
@@ -992,9 +992,9 @@ event_sinkhole_http_referer = {
          ('time.source', 'timestamp', add_UTC_to_timestamp),
      ],
      'optional_fields': [
-        ('malware.name', 'family', validate_to_none),
+        ('malware.name', 'infection', validate_to_none),
+        ('extra.', 'family', validate_to_none),
          ('extra.', 'tag', validate_to_none),
-        ('extra.', 'infection', validate_to_none),
          ('protocol.transport', 'protocol'),
          ('extra.', 'http_referer_ip', validate_ip),
          ('extra.', 'http_referer_port', convert_int),
===============================================


Kind regards
Thomas



More information about the IntelMQ-dev mailing list