Shadowserver parser: Bad mapping for malware events

List overview All Threads
Download

newer

older

classification attributes in...

Re: [IntelMQ-dev] [IntelMQ-users]...

Thomas Hungenberg

24 Jan 2024 24 Jan '24

9:28 a.m.

Hi all,

the parsers for malware events provided by different sources usually store the malware name in malware.name and classification.identifier is left blank (or set to the feed's name). When using the malware name mapping, a harmonized malware name is subsequently written to classification.identifier. So finally you have the original name in malware.name and the harmonized name in classification.identifier.

Formerly (in the version we initially provided), the Shadowserver parser also stored the malware name in malware.name, see e.g. https://github.com/certtools/intelmq/blob/c61ff2fd4232d6937f3815377b75f682a6fcf790/intelmq/bots/parsers/shadowserver/_config.py line 387

However, for some time the Shadowserver parser now writes the malware name ("infection") to classification.identifier and "family" to malware.name instead. This is bad for several reasons: - it is not consistent with parsers for other malware feeds - it breaks deduplicators matching on malware.name - the malware name mapping overwrites classification.identifier with the value of "family" (which often is empty)

Here is a patch (for the version included with IntelMQ 3.2.1) to fix this problem and make malware events parsed by the Shadowserver parser consistent with other parsers again:

=============================================== diff --git a/_config.py.orig b/_config.py index bea3d0c..431bcb9 100644 --- a/_config.py.orig +++ b/_config.py @@ -867,10 +867,9 @@ event_sinkhole = { ('source.port', 'src_port', convert_int), ], 'optional_fields': [ - ('classification.identifier', 'infection', validate_to_none), - ('malware.name', 'family', validate_to_none), + ('malware.name', 'infection', validate_to_none), + ('extra.', 'family', validate_to_none), ('extra.', 'tag', validate_to_none), - ('extra.', 'infection', validate_to_none), ('protocol.transport', 'protocol'), ('source.asn', 'src_asn', invalidate_zero), ('source.geolocation.cc', 'src_geo'), @@ -899,6 +898,7 @@ event_sinkhole = { 'constant_fields': { 'classification.taxonomy': 'malicious-code', 'classification.type': 'infected-system', + 'classification.identifier': 'sinkhole-events', }, }

@@ -944,10 +944,9 @@ event_sinkhole_http = { ('source.port', 'src_port', convert_int), ], 'optional_fields': [ - ('classification.identifier', 'tag'), - ('malware.name', 'family', validate_to_none), + ('malware.name', 'infection', validate_to_none), + ('extra.', 'family', validate_to_none), ('extra.', 'tag', validate_to_none), - ('extra.', 'infection', validate_to_none), ('protocol.transport', 'protocol'), ('source.asn', 'src_asn', invalidate_zero), ('source.geolocation.cc', 'src_geo'), @@ -982,6 +981,7 @@ event_sinkhole_http = { 'constant_fields': { 'classification.taxonomy': 'malicious-code', 'classification.type': 'infected-system', + 'classification.identifier': 'sinkhole-http-events', 'protocol.application': 'http', }, } @@ -992,9 +992,9 @@ event_sinkhole_http_referer = { ('time.source', 'timestamp', add_UTC_to_timestamp), ], 'optional_fields': [ - ('malware.name', 'family', validate_to_none), + ('malware.name', 'infection', validate_to_none), + ('extra.', 'family', validate_to_none), ('extra.', 'tag', validate_to_none), - ('extra.', 'infection', validate_to_none), ('protocol.transport', 'protocol'), ('extra.', 'http_referer_ip', validate_ip), ('extra.', 'http_referer_port', convert_int), ===============================================

Kind regards Thomas

Show replies by date

Kamil Mankowski

24 Jan 24 Jan

10:23 a.m.

Hi,

thanks for the patch, could you please have a look if it is correct in the incoming ShadowServer parser mapping? https://github.com/The-Shadowserver-Foundation/report_schema/blob/main/intel...

I'm pretty sure I was working with them to clean up such discrepancies, but we may have missed something. I don't want the next release to revert your changes unintentionally.

Best regards

// Kamil Mańkowski mankowski@cert.at - T: +43 676 898 298 7204 // CERT Austria - https://www.cert.at/ // CERT.at GmbH, FB-Nr. 561772k, HG Wien

On 1/24/24 10:28, Thomas Hungenberg via IntelMQ-dev wrote:

...

Hi all,

the parsers for malware events provided by different sources usually store the malware name in malware.name and classification.identifier is left blank (or set to the feed's name). When using the malware name mapping, a harmonized malware name is subsequently written to classification.identifier. So finally you have the original name in malware.name and the harmonized name in classification.identifier.

Formerly (in the version we initially provided), the Shadowserver parser also stored the malware name in malware.name, see e.g. https://github.com/certtools/intelmq/blob/c61ff2fd4232d6937f3815377b75f682a6fcf790/intelmq/bots/parsers/shadowserver/_config.py line 387

However, for some time the Shadowserver parser now writes the malware name ("infection") to classification.identifier and "family" to malware.name instead. This is bad for several reasons:

it is not consistent with parsers for other malware feeds

it breaks deduplicators matching on malware.name

the malware name mapping overwrites classification.identifier with the

value of "family" (which often is empty)

Here is a patch (for the version included with IntelMQ 3.2.1) to fix this problem and make malware events parsed by the Shadowserver parser consistent with other parsers again:

=============================================== diff --git a/_config.py.orig b/_config.py index bea3d0c..431bcb9 100644 --- a/_config.py.orig +++ b/_config.py @@ -867,10 +867,9 @@ event_sinkhole = { ('source.port', 'src_port', convert_int), ], 'optional_fields': [ - ('classification.identifier', 'infection', validate_to_none), - ('malware.name', 'family', validate_to_none), + ('malware.name', 'infection', validate_to_none), + ('extra.', 'family', validate_to_none), ('extra.', 'tag', validate_to_none), - ('extra.', 'infection', validate_to_none), ('protocol.transport', 'protocol'), ('source.asn', 'src_asn', invalidate_zero), ('source.geolocation.cc', 'src_geo'), @@ -899,6 +898,7 @@ event_sinkhole = { 'constant_fields': { 'classification.taxonomy': 'malicious-code', 'classification.type': 'infected-system', + 'classification.identifier': 'sinkhole-events', }, }

@@ -944,10 +944,9 @@ event_sinkhole_http = { ('source.port', 'src_port', convert_int), ], 'optional_fields': [ - ('classification.identifier', 'tag'), - ('malware.name', 'family', validate_to_none), + ('malware.name', 'infection', validate_to_none), + ('extra.', 'family', validate_to_none), ('extra.', 'tag', validate_to_none), - ('extra.', 'infection', validate_to_none), ('protocol.transport', 'protocol'), ('source.asn', 'src_asn', invalidate_zero), ('source.geolocation.cc', 'src_geo'), @@ -982,6 +981,7 @@ event_sinkhole_http = { 'constant_fields': { 'classification.taxonomy': 'malicious-code', 'classification.type': 'infected-system', + 'classification.identifier': 'sinkhole-http-events', 'protocol.application': 'http', }, } @@ -992,9 +992,9 @@ event_sinkhole_http_referer = { ('time.source', 'timestamp', add_UTC_to_timestamp), ], 'optional_fields': [ - ('malware.name', 'family', validate_to_none), + ('malware.name', 'infection', validate_to_none), + ('extra.', 'family', validate_to_none), ('extra.', 'tag', validate_to_none), - ('extra.', 'infection', validate_to_none), ('protocol.transport', 'protocol'), ('extra.', 'http_referer_ip', validate_ip), ('extra.', 'http_referer_port', convert_int), ===============================================

Kind regards Thomas

IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/

Thomas Hungenberg

11:15 a.m.

Hi Kamil,

I had a quick look at the mapping. Unfortunately, it is not correct.

The following changes should be applied to the mapping for ALL sinkhole related feeds:

================================================== "constant_fields" : { "classification.taxonomy" : "malicious-code", "classification.type" : "infected-system" + "classification.identifier" : "example: event4-sinkhole", # set CI to feed name (with dashes) like with other feeds },

"optional_fields" : [ [ - "classification.identifier", + "malware.name", "infection", "validate_to_none" ],

[ - "malware.name", + "extra.", "family", "validate_to_none" ],

- [ - "extra.", - "infection", - "validate_to_none" - ], ==================================================

I also noticed that classification.taxonomy and classification.type are set to "other" for some sinkhole feeds like this:

"event_sinkhole_http_referer" : { "constant_fields" : { "classification.identifier" : "event-sinkhole-http-referer", "classification.taxonomy" : "other", "classification.type" : "other"

This should be changed to:

"classification.taxonomy" : "malicious-code", "classification.type" : "infected-system",

Kind regards Thomas

On 24.01.24 11:23, Kamil Mankowski via IntelMQ-dev wrote:

...

Hi,

thanks for the patch, could you please have a look if it is correct in the incoming ShadowServer parser mapping? https://github.com/The-Shadowserver-Foundation/report_schema/blob/main/intel...

I'm pretty sure I was working with them to clean up such discrepancies, but we may have missed something. I don't want the next release to revert your changes unintentionally.

Best regards

// Kamil Mańkowski mankowski@cert.at - T: +43 676 898 298 7204 // CERT Austria - https://www.cert.at/ // CERT.at GmbH, FB-Nr. 561772k, HG Wien

On 1/24/24 10:28, Thomas Hungenberg via IntelMQ-dev wrote:

...
Hi all,

the parsers for malware events provided by different sources usually store the malware name in malware.name and classification.identifier is left blank (or set to the feed's name). When using the malware name mapping, a harmonized malware name is subsequently written to classification.identifier. So finally you have the original name in malware.name and the harmonized name in classification.identifier.

Formerly (in the version we initially provided), the Shadowserver parser also stored the malware name in malware.name, see e.g. https://github.com/certtools/intelmq/blob/c61ff2fd4232d6937f3815377b75f682a6fcf790/intelmq/bots/parsers/shadowserver/_config.py line 387

However, for some time the Shadowserver parser now writes the malware name ("infection") to classification.identifier and "family" to malware.name instead. This is bad for several reasons:

it is not consistent with parsers for other malware feeds

it breaks deduplicators matching on malware.name

the malware name mapping overwrites classification.identifier with the

value of "family" (which often is empty)

Here is a patch (for the version included with IntelMQ 3.2.1) to fix this problem and make malware events parsed by the Shadowserver parser consistent with other parsers again:

=============================================== diff --git a/_config.py.orig b/_config.py index bea3d0c..431bcb9 100644 --- a/_config.py.orig +++ b/_config.py @@ -867,10 +867,9 @@ event_sinkhole = { ('source.port', 'src_port', convert_int), ], 'optional_fields': [ - ('classification.identifier', 'infection', validate_to_none), - ('malware.name', 'family', validate_to_none), + ('malware.name', 'infection', validate_to_none), + ('extra.', 'family', validate_to_none), ('extra.', 'tag', validate_to_none), - ('extra.', 'infection', validate_to_none), ('protocol.transport', 'protocol'), ('source.asn', 'src_asn', invalidate_zero), ('source.geolocation.cc', 'src_geo'), @@ -899,6 +898,7 @@ event_sinkhole = { 'constant_fields': { 'classification.taxonomy': 'malicious-code', 'classification.type': 'infected-system', + 'classification.identifier': 'sinkhole-events', }, }

@@ -944,10 +944,9 @@ event_sinkhole_http = { ('source.port', 'src_port', convert_int), ], 'optional_fields': [ - ('classification.identifier', 'tag'), - ('malware.name', 'family', validate_to_none), + ('malware.name', 'infection', validate_to_none), + ('extra.', 'family', validate_to_none), ('extra.', 'tag', validate_to_none), - ('extra.', 'infection', validate_to_none), ('protocol.transport', 'protocol'), ('source.asn', 'src_asn', invalidate_zero), ('source.geolocation.cc', 'src_geo'), @@ -982,6 +981,7 @@ event_sinkhole_http = { 'constant_fields': { 'classification.taxonomy': 'malicious-code', 'classification.type': 'infected-system', + 'classification.identifier': 'sinkhole-http-events', 'protocol.application': 'http', }, } @@ -992,9 +992,9 @@ event_sinkhole_http_referer = { ('time.source', 'timestamp', add_UTC_to_timestamp), ], 'optional_fields': [ - ('malware.name', 'family', validate_to_none), + ('malware.name', 'infection', validate_to_none), + ('extra.', 'family', validate_to_none), ('extra.', 'tag', validate_to_none), - ('extra.', 'infection', validate_to_none), ('protocol.transport', 'protocol'), ('extra.', 'http_referer_ip', validate_ip), ('extra.', 'http_referer_port', convert_int), ===============================================

Kind regards Thomas

IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/

IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/

Kamil Mankowski

12:02 p.m.

Thanks, I'm forwarding it to the ShadowServer for the corrections

Best regards

// Kamil Mańkowski mankowski@cert.at - T: +43 676 898 298 7204 // CERT Austria - https://www.cert.at/ // CERT.at GmbH, FB-Nr. 561772k, HG Wien

On 1/24/24 12:15, Thomas Hungenberg wrote:

...

Hi Kamil,

I had a quick look at the mapping. Unfortunately, it is not correct.

The following changes should be applied to the mapping for ALL sinkhole related feeds:

==================================================        "constant_fields" : {           "classification.taxonomy" : "malicious-code",           "classification.type" : "infected-system" +         "classification.identifier" : "example: event4-sinkhole",    # set CI to feed name (with dashes) like with other feeds        },

"optional_fields" : [           [ -            "classification.identifier", +            "malware.name",              "infection",              "validate_to_none"           ],

[ -            "malware.name", +            "extra.",              "family",              "validate_to_none"           ],

-         [ -            "extra.", -            "infection", -            "validate_to_none"

-         ],

I also noticed that classification.taxonomy and classification.type are set to "other" for some sinkhole feeds like this:

"event_sinkhole_http_referer" : {       "constant_fields" : {          "classification.identifier" : "event-sinkhole-http-referer",          "classification.taxonomy" : "other",          "classification.type" : "other"

This should be changed to:

"classification.taxonomy" : "malicious-code",          "classification.type" : "infected-system",

Kind regards Thomas

On 24.01.24 11:23, Kamil Mankowski via IntelMQ-dev wrote:

...
Hi,

thanks for the patch, could you please have a look if it is correct in the incoming ShadowServer parser mapping? https://github.com/The-Shadowserver-Foundation/report_schema/blob/main/intel...

I'm pretty sure I was working with them to clean up such discrepancies, but we may have missed something. I don't want the next release to revert your changes unintentionally.

Best regards

// Kamil Mańkowski mankowski@cert.at - T: +43 676 898 298 7204 // CERT Austria - https://www.cert.at/ // CERT.at GmbH, FB-Nr. 561772k, HG Wien

On 1/24/24 10:28, Thomas Hungenberg via IntelMQ-dev wrote:

...
Hi all,

the parsers for malware events provided by different sources usually store the malware name in malware.name and classification.identifier is left blank (or set to the feed's name). When using the malware name mapping, a harmonized malware name is subsequently written to classification.identifier. So finally you have the original name in malware.name and the harmonized name in classification.identifier.

Formerly (in the version we initially provided), the Shadowserver parser also stored the malware name in malware.name, see e.g. https://github.com/certtools/intelmq/blob/c61ff2fd4232d6937f3815377b75f682a6fcf790/intelmq/bots/parsers/shadowserver/_config.py line 387

However, for some time the Shadowserver parser now writes the malware name ("infection") to classification.identifier and "family" to malware.name instead. This is bad for several reasons:

it is not consistent with parsers for other malware feeds

it breaks deduplicators matching on malware.name

the malware name mapping overwrites classification.identifier with the

value of "family" (which often is empty)

Here is a patch (for the version included with IntelMQ 3.2.1) to fix this problem and make malware events parsed by the Shadowserver parser consistent with other parsers again:

=============================================== diff --git a/_config.py.orig b/_config.py index bea3d0c..431bcb9 100644 --- a/_config.py.orig +++ b/_config.py @@ -867,10 +867,9 @@ event_sinkhole = {           ('source.port', 'src_port', convert_int),       ],       'optional_fields': [ -        ('classification.identifier', 'infection', validate_to_none), -        ('malware.name', 'family', validate_to_none), +        ('malware.name', 'infection', validate_to_none), +        ('extra.', 'family', validate_to_none),           ('extra.', 'tag', validate_to_none), -        ('extra.', 'infection', validate_to_none),           ('protocol.transport', 'protocol'),           ('source.asn', 'src_asn', invalidate_zero),           ('source.geolocation.cc', 'src_geo'), @@ -899,6 +898,7 @@ event_sinkhole = {       'constant_fields': {           'classification.taxonomy': 'malicious-code',           'classification.type': 'infected-system', +        'classification.identifier': 'sinkhole-events',       },   }

@@ -944,10 +944,9 @@ event_sinkhole_http = {           ('source.port', 'src_port', convert_int),       ],       'optional_fields': [ -        ('classification.identifier', 'tag'), -        ('malware.name', 'family', validate_to_none), +        ('malware.name', 'infection', validate_to_none), +        ('extra.', 'family', validate_to_none),           ('extra.', 'tag', validate_to_none), -        ('extra.', 'infection', validate_to_none),           ('protocol.transport', 'protocol'),           ('source.asn', 'src_asn', invalidate_zero),           ('source.geolocation.cc', 'src_geo'), @@ -982,6 +981,7 @@ event_sinkhole_http = {       'constant_fields': {           'classification.taxonomy': 'malicious-code',           'classification.type': 'infected-system', +        'classification.identifier': 'sinkhole-http-events',           'protocol.application': 'http',       },   } @@ -992,9 +992,9 @@ event_sinkhole_http_referer = {           ('time.source', 'timestamp', add_UTC_to_timestamp),       ],       'optional_fields': [ -        ('malware.name', 'family', validate_to_none), +        ('malware.name', 'infection', validate_to_none), +        ('extra.', 'family', validate_to_none),           ('extra.', 'tag', validate_to_none), -        ('extra.', 'infection', validate_to_none),           ('protocol.transport', 'protocol'),           ('extra.', 'http_referer_ip', validate_ip),           ('extra.', 'http_referer_port', convert_int), ===============================================

Kind regards Thomas

IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/

IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/

Kamil Mankowski

25 Jan 25 Jan

7:36 a.m.

Hi Thomas,

I've got answer from ShadowServer with the proposed mapping changes. Could you have a look if this diff looks like solving the issue?

In my eyes it's still mixing the value of "malware.name" - once it's 'family', once 'infection', but it may also be a difference in data available in reports.

event4_microsoft_sinkhole: *************** *** 1,5 **** --- 1,6 ---- { "constant_fields" : { + "classification.identifier" : "event4-microsoft-sinkhole", "classification.taxonomy" : "malicious-code", "classification.type" : "infected-system" }, *************** *** 7,17 **** "file_name" : "event4_microsoft_sinkhole", "optional_fields" : [ [ - "classification.identifier", - "infection", - "validate_to_none" - ], - [ "malware.name", "family", "validate_to_none" --- 8,13 ----

event4_microsoft_sinkhole_http: *************** *** 1,5 **** --- 1,6 ---- { "constant_fields" : { + "classification.identifier" : "event4-microsoft-sinkhole-http", "classification.taxonomy" : "malicious-code", "classification.type" : "infected-system", "protocol.application" : "http" *************** *** 8,17 **** "file_name" : "event4_microsoft_sinkhole_http", "optional_fields" : [ [ - "classification.identifier", - "tag" - ], - [ "malware.name", "family", "validate_to_none" --- 9,14 ----

event6_sinkhole: *************** *** 1,5 **** --- 1,6 ---- { "constant_fields" : { + "classification.identifier" : "event6-sinkhole", "classification.taxonomy" : "malicious-code", "classification.type" : "infected-system" }, *************** *** 7,17 **** "file_name" : "event6_sinkhole", "optional_fields" : [ [ - "classification.identifier", - "infection", - "validate_to_none" - ], - [ "malware.name", "family", "validate_to_none" --- 8,13 ----

event6_sinkhole_http: *************** *** 1,5 **** --- 1,6 ---- { "constant_fields" : { + "classification.identifier" : "event6-sinkhole-http", "classification.taxonomy" : "malicious-code", "classification.type" : "infected-system", "protocol.application" : "http" *************** *** 8,17 **** "file_name" : "event6_sinkhole_http", "optional_fields" : [ [ - "classification.identifier", - "tag" - ], - [ "malware.name", "family", "validate_to_none" --- 9,14 ----

event6_sinkhole_http_referer: *************** *** 1,8 **** { "constant_fields" : { "classification.identifier" : "event6-sinkhole-http-referer", ! "classification.taxonomy" : "other", ! "classification.type" : "other" }, "feed_name" : "Sinkhole-Events-HTTP-Referer IPv6", "file_name" : "event6_sinkhole_http_referer", --- 1,8 ---- { "constant_fields" : { "classification.identifier" : "event6-sinkhole-http-referer", ! "classification.taxonomy" : "malicious-code", ! "classification.type" : "infected-system" }, "feed_name" : "Sinkhole-Events-HTTP-Referer IPv6", "file_name" : "event6_sinkhole_http_referer",

event_honeypot_brute_force: *************** *** 1,5 **** --- 1,6 ---- { "constant_fields" : { + "classification.identifier" : "honeypot-brute-force", "classification.taxonomy" : "intrusion-attempts", "classification.type" : "brute-force" }, *************** *** 7,16 **** "file_name" : "event4_honeypot_brute_force", "optional_fields" : [ [ - "classification.identifier", - "application" - ], - [ "destination.account", "username", "validate_to_none" --- 8,13 ----

event_honeypot_darknet: *************** *** 1,5 **** --- 1,6 ---- { "constant_fields" : { + "classification.identifier" : "honeypot-darknet", "classification.taxonomy" : "other", "classification.type" : "other" }, *************** *** 7,17 **** "file_name" : "event4_honeypot_darknet", "optional_fields" : [ [ - "classification.identifier", - "tag", - "validate_to_none" - ], - [ "malware.name", "infection", "validate_to_none" --- 8,13 ----

event_sinkhole: *************** *** 1,5 **** --- 1,6 ---- { "constant_fields" : { + "classification.identifier" : "sinkhole", "classification.taxonomy" : "malicious-code", "classification.type" : "infected-system" }, *************** *** 7,17 **** "file_name" : "event4_sinkhole", "optional_fields" : [ [ - "classification.identifier", - "infection", - "validate_to_none" - ], - [ "malware.name", "family", "validate_to_none" --- 8,13 ----

event_sinkhole_http: *************** *** 1,5 **** --- 1,6 ---- { "constant_fields" : { + "classification.identifier" : "sinkhole-http", "classification.taxonomy" : "malicious-code", "classification.type" : "infected-system", "protocol.application" : "http" *************** *** 8,17 **** "file_name" : "event4_sinkhole_http", "optional_fields" : [ [ - "classification.identifier", - "tag" - ], - [ "malware.name", "family", "validate_to_none" --- 9,14 ----

event_sinkhole_http_referer: *************** *** 1,8 **** { "constant_fields" : { "classification.identifier" : "sinkhole-http-referer", ! "classification.taxonomy" : "other", ! "classification.type" : "other" }, "feed_name" : "Sinkhole-Events-HTTP-Referer IPv4", "file_name" : "event4_sinkhole_http_referer", --- 1,8 ---- { "constant_fields" : { "classification.identifier" : "sinkhole-http-referer", ! "classification.taxonomy" : "malicious-code", ! "classification.type" : "infected-system" }, "feed_name" : "Sinkhole-Events-HTTP-Referer IPv4", "file_name" : "event4_sinkhole_http_referer",

Best regards

// Kamil Mańkowski mankowski@cert.at - T: +43 676 898 298 7204 // CERT Austria - https://www.cert.at/ // CERT.at GmbH, FB-Nr. 561772k, HG Wien

On 1/24/24 13:02, Kamil Mankowski wrote:

...

Thanks, I'm forwarding it to the ShadowServer for the corrections

Best regards

// Kamil Mańkowski mankowski@cert.at - T: +43 676 898 298 7204 // CERT Austria - https://www.cert.at/ // CERT.at GmbH, FB-Nr. 561772k, HG Wien

On 1/24/24 12:15, Thomas Hungenberg wrote:

...
Hi Kamil,

I had a quick look at the mapping. Unfortunately, it is not correct.

The following changes should be applied to the mapping for ALL sinkhole related feeds:

==================================================         "constant_fields" : {            "classification.taxonomy" : "malicious-code",            "classification.type" : "infected-system" +         "classification.identifier" : "example: event4-sinkhole", # set CI to feed name (with dashes) like with other feeds         },

"optional_fields" : [            [ -            "classification.identifier", +            "malware.name",               "infection",               "validate_to_none"            ],

[ -            "malware.name", +            "extra.",               "family",               "validate_to_none"            ],

-         [ -            "extra.", -            "infection", -            "validate_to_none"

-         ],

I also noticed that classification.taxonomy and classification.type are set to "other" for some sinkhole feeds like this:

"event_sinkhole_http_referer" : {        "constant_fields" : {           "classification.identifier" : "event-sinkhole-http-referer",           "classification.taxonomy" : "other",           "classification.type" : "other"

This should be changed to:

"classification.taxonomy" : "malicious-code",           "classification.type" : "infected-system",

Kind regards Thomas

On 24.01.24 11:23, Kamil Mankowski via IntelMQ-dev wrote:

...
Hi,

thanks for the patch, could you please have a look if it is correct in the incoming ShadowServer parser mapping? https://github.com/The-Shadowserver-Foundation/report_schema/blob/main/intel...

I'm pretty sure I was working with them to clean up such discrepancies, but we may have missed something. I don't want the next release to revert your changes unintentionally.

Best regards

// Kamil Mańkowski mankowski@cert.at - T: +43 676 898 298 7204 // CERT Austria - https://www.cert.at/ // CERT.at GmbH, FB-Nr. 561772k, HG Wien

On 1/24/24 10:28, Thomas Hungenberg via IntelMQ-dev wrote:

...
Hi all,

the parsers for malware events provided by different sources usually store the malware name in malware.name and classification.identifier is left blank (or set to the feed's name). When using the malware name mapping, a harmonized malware name is subsequently written to classification.identifier. So finally you have the original name in malware.name and the harmonized name in classification.identifier.

Formerly (in the version we initially provided), the Shadowserver parser also stored the malware name in malware.name, see e.g. https://github.com/certtools/intelmq/blob/c61ff2fd4232d6937f3815377b75f682a6fcf790/intelmq/bots/parsers/shadowserver/_config.py line 387

However, for some time the Shadowserver parser now writes the malware name ("infection") to classification.identifier and "family" to malware.name instead. This is bad for several reasons:

it is not consistent with parsers for other malware feeds

it breaks deduplicators matching on malware.name

the malware name mapping overwrites classification.identifier with

the    value of "family" (which often is empty)

Here is a patch (for the version included with IntelMQ 3.2.1) to fix this problem and make malware events parsed by the Shadowserver parser consistent with other parsers again:

=============================================== diff --git a/_config.py.orig b/_config.py index bea3d0c..431bcb9 100644 --- a/_config.py.orig +++ b/_config.py @@ -867,10 +867,9 @@ event_sinkhole = {           ('source.port', 'src_port', convert_int),       ],       'optional_fields': [ -        ('classification.identifier', 'infection', validate_to_none), -        ('malware.name', 'family', validate_to_none), +        ('malware.name', 'infection', validate_to_none), +        ('extra.', 'family', validate_to_none),           ('extra.', 'tag', validate_to_none), -        ('extra.', 'infection', validate_to_none),           ('protocol.transport', 'protocol'),           ('source.asn', 'src_asn', invalidate_zero),           ('source.geolocation.cc', 'src_geo'), @@ -899,6 +898,7 @@ event_sinkhole = {       'constant_fields': {           'classification.taxonomy': 'malicious-code',           'classification.type': 'infected-system', +        'classification.identifier': 'sinkhole-events',       },   }

@@ -944,10 +944,9 @@ event_sinkhole_http = {           ('source.port', 'src_port', convert_int),       ],       'optional_fields': [ -        ('classification.identifier', 'tag'), -        ('malware.name', 'family', validate_to_none), +        ('malware.name', 'infection', validate_to_none), +        ('extra.', 'family', validate_to_none),           ('extra.', 'tag', validate_to_none), -        ('extra.', 'infection', validate_to_none),           ('protocol.transport', 'protocol'),           ('source.asn', 'src_asn', invalidate_zero),           ('source.geolocation.cc', 'src_geo'), @@ -982,6 +981,7 @@ event_sinkhole_http = {       'constant_fields': {           'classification.taxonomy': 'malicious-code',           'classification.type': 'infected-system', +        'classification.identifier': 'sinkhole-http-events',           'protocol.application': 'http',       },   } @@ -992,9 +992,9 @@ event_sinkhole_http_referer = {           ('time.source', 'timestamp', add_UTC_to_timestamp),       ],       'optional_fields': [ -        ('malware.name', 'family', validate_to_none), +        ('malware.name', 'infection', validate_to_none), +        ('extra.', 'family', validate_to_none),           ('extra.', 'tag', validate_to_none), -        ('extra.', 'infection', validate_to_none),           ('protocol.transport', 'protocol'),           ('extra.', 'http_referer_ip', validate_ip),           ('extra.', 'http_referer_port', convert_int), ===============================================

Kind regards Thomas

IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/

IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/

Thomas Hungenberg

26 Jan 26 Jan

10:01 a.m.

Hi Kamil,

I thought about this again in more detail. The classification attributes should describe the incident with getting more specific from taxonomy to identifier. So for feeds like Open-SNMP, it makes sense to set the classification.identifer to the feed's name like this:

'classification.taxonomy': 'vulnerable', 'classification.type': 'vulnerable-system', 'classification.identifier': 'open-snmp',

However, for malware events my proposal of setting the classification.identifier to the feed's name does not make sense as a feedname like "event4-microsoft-sinkhole" is not a specific description of the incident itself but rather the type of source of the information.

So I think it is best to keep writing the malware name ("infection" or "tag") to classification.identifier as this is a specific description of the individual incident. However, the malware name ("infection" or "tag") needs also be stored in malware.name for the malware name mapping to work. "family" should instead be stored in extra.

So the neccessary changes for event_sinkhole and event_sinkhole_dns look like:

- ('malware.name', 'family', validate_to_none), + ('malware.name', 'infection', validate_to_none), - ('extra.', 'infection', validate_to_none), + ('extra.', 'family', validate_to_none),

For event_sinkhole_http:

- ('classification.identifier', 'tag'), - ('malware.name', 'family', validate_to_none), + ('classification.identifier', 'infection', validate_to_none), + ('malware.name', 'infection', validate_to_none), ('extra.', 'tag', validate_to_none), - ('extra.', 'infection', validate_to_none), + ('extra.', 'family', validate_to_none),

For event_sinkhole_http_referer:

'optional_fields': - ('malware.name', 'family', validate_to_none), + ('classification.identifier', 'infection', validate_to_none), + ('malware.name', 'infection', validate_to_none), - ('extra.', 'infection', validate_to_none), + ('extra.', 'family', validate_to_none),

'constant_fields': { - 'classification.taxonomy': 'other', - 'classification.type': 'other', - 'classification.identifier': 'sinkhole-http-referer', + 'classification.taxonomy': 'malicious-code', + 'classification.type': 'infected-system', + 'protocol.application': 'http',

For some other feeds like "malware_url", I have also added the missing "validate_to_none" flag to make it consistent with all feeds.

Please find attached the corrected patch for _config.py included with IntelMQ 3.2.1 and the complete file.

I will now have a look at the json schema.

Kind regards Thomas

On 25.01.24 08:36, Kamil Mankowski wrote:

...

Hi Thomas,

I've got answer from ShadowServer with the proposed mapping changes. Could you have a look if this diff looks like solving the issue?

In my eyes it's still mixing the value of "malware.name" - once it's 'family', once 'infection', but it may also be a difference in data available in reports.

event4_microsoft_sinkhole:

*** 1,5 **** --- 1,6 ---- {      "constant_fields" : { +       "classification.identifier" : "event4-microsoft-sinkhole",         "classification.taxonomy" : "malicious-code",         "classification.type" : "infected-system"      },

*** 7,17 ****      "file_name" : "event4_microsoft_sinkhole",      "optional_fields" : [         [ -          "classification.identifier", -          "infection", -          "validate_to_none" -       ], -       [            "malware.name",            "family",            "validate_to_none" --- 8,13 ----

event4_microsoft_sinkhole_http:

*** 1,5 **** --- 1,6 ---- {      "constant_fields" : { +       "classification.identifier" : "event4-microsoft-sinkhole-http",         "classification.taxonomy" : "malicious-code",         "classification.type" : "infected-system",         "protocol.application" : "http"

*** 8,17 ****      "file_name" : "event4_microsoft_sinkhole_http",      "optional_fields" : [         [ -          "classification.identifier", -          "tag" -       ], -       [            "malware.name",            "family",            "validate_to_none" --- 9,14 ----

event6_sinkhole:

*** 1,5 **** --- 1,6 ---- {      "constant_fields" : { +       "classification.identifier" : "event6-sinkhole",         "classification.taxonomy" : "malicious-code",         "classification.type" : "infected-system"      },

*** 7,17 ****      "file_name" : "event6_sinkhole",      "optional_fields" : [         [ -          "classification.identifier", -          "infection", -          "validate_to_none" -       ], -       [            "malware.name",            "family",            "validate_to_none" --- 8,13 ----

event6_sinkhole_http:

*** 1,5 **** --- 1,6 ---- {      "constant_fields" : { +       "classification.identifier" : "event6-sinkhole-http",         "classification.taxonomy" : "malicious-code",         "classification.type" : "infected-system",         "protocol.application" : "http"

*** 8,17 ****      "file_name" : "event6_sinkhole_http",      "optional_fields" : [         [ -          "classification.identifier", -          "tag" -       ], -       [            "malware.name",            "family",            "validate_to_none" --- 9,14 ----

event6_sinkhole_http_referer:

*** 1,8 **** {      "constant_fields" : {         "classification.identifier" : "event6-sinkhole-http-referer", !       "classification.taxonomy" : "other", !       "classification.type" : "other"      },      "feed_name" : "Sinkhole-Events-HTTP-Referer IPv6",      "file_name" : "event6_sinkhole_http_referer", --- 1,8 ---- {      "constant_fields" : {         "classification.identifier" : "event6-sinkhole-http-referer", !       "classification.taxonomy" : "malicious-code", !       "classification.type" : "infected-system"      },      "feed_name" : "Sinkhole-Events-HTTP-Referer IPv6",      "file_name" : "event6_sinkhole_http_referer",

event_honeypot_brute_force:

*** 1,5 **** --- 1,6 ---- {      "constant_fields" : { +       "classification.identifier" : "honeypot-brute-force",         "classification.taxonomy" : "intrusion-attempts",         "classification.type" : "brute-force"      },

*** 7,16 ****      "file_name" : "event4_honeypot_brute_force",      "optional_fields" : [         [ -          "classification.identifier", -          "application" -       ], -       [            "destination.account",            "username",            "validate_to_none" --- 8,13 ----

event_honeypot_darknet:

*** 1,5 **** --- 1,6 ---- {      "constant_fields" : { +       "classification.identifier" : "honeypot-darknet",         "classification.taxonomy" : "other",         "classification.type" : "other"      },

*** 7,17 ****      "file_name" : "event4_honeypot_darknet",      "optional_fields" : [         [ -          "classification.identifier", -          "tag", -          "validate_to_none" -       ], -       [            "malware.name",            "infection",            "validate_to_none" --- 8,13 ----

event_sinkhole:

*** 1,5 **** --- 1,6 ---- {      "constant_fields" : { +       "classification.identifier" : "sinkhole",         "classification.taxonomy" : "malicious-code",         "classification.type" : "infected-system"      },

*** 7,17 ****      "file_name" : "event4_sinkhole",      "optional_fields" : [         [ -          "classification.identifier", -          "infection", -          "validate_to_none" -       ], -       [            "malware.name",            "family",            "validate_to_none" --- 8,13 ----

event_sinkhole_http:

*** 1,5 **** --- 1,6 ---- {      "constant_fields" : { +       "classification.identifier" : "sinkhole-http",         "classification.taxonomy" : "malicious-code",         "classification.type" : "infected-system",         "protocol.application" : "http"

*** 8,17 ****      "file_name" : "event4_sinkhole_http",      "optional_fields" : [         [ -          "classification.identifier", -          "tag" -       ], -       [            "malware.name",            "family",            "validate_to_none" --- 9,14 ----

event_sinkhole_http_referer:

*** 1,8 **** {      "constant_fields" : {         "classification.identifier" : "sinkhole-http-referer", !       "classification.taxonomy" : "other", !       "classification.type" : "other"      },      "feed_name" : "Sinkhole-Events-HTTP-Referer IPv4",      "file_name" : "event4_sinkhole_http_referer", --- 1,8 ---- {      "constant_fields" : {         "classification.identifier" : "sinkhole-http-referer", !       "classification.taxonomy" : "malicious-code", !       "classification.type" : "infected-system"      },      "feed_name" : "Sinkhole-Events-HTTP-Referer IPv4",      "file_name" : "event4_sinkhole_http_referer",

Best regards

// Kamil Mańkowski mankowski@cert.at - T: +43 676 898 298 7204 // CERT Austria - https://www.cert.at/ // CERT.at GmbH, FB-Nr. 561772k, HG Wien

On 1/24/24 13:02, Kamil Mankowski wrote:

...
Thanks, I'm forwarding it to the ShadowServer for the corrections

Best regards

// Kamil Mańkowski mankowski@cert.at - T: +43 676 898 298 7204 // CERT Austria - https://www.cert.at/ // CERT.at GmbH, FB-Nr. 561772k, HG Wien

On 1/24/24 12:15, Thomas Hungenberg wrote:

...
Hi Kamil,

I had a quick look at the mapping. Unfortunately, it is not correct.

The following changes should be applied to the mapping for ALL sinkhole related feeds:

==================================================         "constant_fields" : {            "classification.taxonomy" : "malicious-code",            "classification.type" : "infected-system" +         "classification.identifier" : "example: event4-sinkhole", # set CI to feed name (with dashes) like with other feeds         },

"optional_fields" : [            [ -            "classification.identifier", +            "malware.name",               "infection",               "validate_to_none"            ],

[ -            "malware.name", +            "extra.",               "family",               "validate_to_none"            ],

-         [ -            "extra.", -            "infection", -            "validate_to_none"

-         ],

I also noticed that classification.taxonomy and classification.type are set to "other" for some sinkhole feeds like this:

"event_sinkhole_http_referer" : {        "constant_fields" : {           "classification.identifier" : "event-sinkhole-http-referer",           "classification.taxonomy" : "other",           "classification.type" : "other"

This should be changed to:

"classification.taxonomy" : "malicious-code",           "classification.type" : "infected-system",

Kind regards Thomas

On 24.01.24 11:23, Kamil Mankowski via IntelMQ-dev wrote:

...
Hi,

thanks for the patch, could you please have a look if it is correct in the incoming ShadowServer parser mapping? https://github.com/The-Shadowserver-Foundation/report_schema/blob/main/intel...

I'm pretty sure I was working with them to clean up such discrepancies, but we may have missed something. I don't want the next release to revert your changes unintentionally.

Best regards

// Kamil Mańkowski mankowski@cert.at - T: +43 676 898 298 7204 // CERT Austria - https://www.cert.at/ // CERT.at GmbH, FB-Nr. 561772k, HG Wien

On 1/24/24 10:28, Thomas Hungenberg via IntelMQ-dev wrote:

...
Hi all,

the parsers for malware events provided by different sources usually store the malware name in malware.name and classification.identifier is left blank (or set to the feed's name). When using the malware name mapping, a harmonized malware name is subsequently written to classification.identifier. So finally you have the original name in malware.name and the harmonized name in classification.identifier.

Formerly (in the version we initially provided), the Shadowserver parser also stored the malware name in malware.name, see e.g. https://github.com/certtools/intelmq/blob/c61ff2fd4232d6937f3815377b75f682a6fcf790/intelmq/bots/parsers/shadowserver/_config.py line 387

However, for some time the Shadowserver parser now writes the malware name ("infection") to classification.identifier and "family" to malware.name instead. This is bad for several reasons:

it is not consistent with parsers for other malware feeds

it breaks deduplicators matching on malware.name

the malware name mapping overwrites classification.identifier with the

value of "family" (which often is empty)

Here is a patch (for the version included with IntelMQ 3.2.1) to fix this problem and make malware events parsed by the Shadowserver parser consistent with other parsers again:

=============================================== diff --git a/_config.py.orig b/_config.py index bea3d0c..431bcb9 100644 --- a/_config.py.orig +++ b/_config.py @@ -867,10 +867,9 @@ event_sinkhole = {           ('source.port', 'src_port', convert_int),       ],       'optional_fields': [ -        ('classification.identifier', 'infection', validate_to_none), -        ('malware.name', 'family', validate_to_none), +        ('malware.name', 'infection', validate_to_none), +        ('extra.', 'family', validate_to_none),           ('extra.', 'tag', validate_to_none), -        ('extra.', 'infection', validate_to_none),           ('protocol.transport', 'protocol'),           ('source.asn', 'src_asn', invalidate_zero),           ('source.geolocation.cc', 'src_geo'), @@ -899,6 +898,7 @@ event_sinkhole = {       'constant_fields': {           'classification.taxonomy': 'malicious-code',           'classification.type': 'infected-system', +        'classification.identifier': 'sinkhole-events',       },   }

@@ -944,10 +944,9 @@ event_sinkhole_http = {           ('source.port', 'src_port', convert_int),       ],       'optional_fields': [ -        ('classification.identifier', 'tag'), -        ('malware.name', 'family', validate_to_none), +        ('malware.name', 'infection', validate_to_none), +        ('extra.', 'family', validate_to_none),           ('extra.', 'tag', validate_to_none), -        ('extra.', 'infection', validate_to_none),           ('protocol.transport', 'protocol'),           ('source.asn', 'src_asn', invalidate_zero),           ('source.geolocation.cc', 'src_geo'), @@ -982,6 +981,7 @@ event_sinkhole_http = {       'constant_fields': {           'classification.taxonomy': 'malicious-code',           'classification.type': 'infected-system', +        'classification.identifier': 'sinkhole-http-events',           'protocol.application': 'http',       },   } @@ -992,9 +992,9 @@ event_sinkhole_http_referer = {           ('time.source', 'timestamp', add_UTC_to_timestamp),       ],       'optional_fields': [ -        ('malware.name', 'family', validate_to_none), +        ('malware.name', 'infection', validate_to_none), +        ('extra.', 'family', validate_to_none),           ('extra.', 'tag', validate_to_none), -        ('extra.', 'infection', validate_to_none),           ('protocol.transport', 'protocol'),           ('extra.', 'http_referer_ip', validate_ip),           ('extra.', 'http_referer_port', convert_int), ===============================================

Kind regards Thomas

IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/

IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/

Thomas Hungenberg

11:25 a.m.

Hi Kamil,

please find attached the updated intelmq.json schema with the neccessary changes for malware-related events. I hope I haven't missed anything.

While going through the schema, I noticed two changes compared to the _config.py included with IntelMQ 3.2.1:

For compromised website: malware.name = tag -> family which is good as "tag" is not a malware name.

For malware_url: "malware.name = tag" has been removed which is good as "tag" is not a malware name.

I noticed there are some other changes as well but I focussed on malware related attributes.

Is there a documentation of all changes that have been made from the _config.py included with IntelMQ 3.2.1 to the json schema?

We need to make sure the changes do not break any scripts when switching from the static config to the schema.

Kind regards Thomas

On 26.01.24 11:01, Thomas Hungenberg via IntelMQ-dev wrote:

...

I will now have a look at the json schema.

Kind regards Thomas

On 25.01.24 08:36, Kamil Mankowski wrote:

...
Hi Thomas,

I've got answer from ShadowServer with the proposed mapping changes. Could you have a look if this diff looks like solving the issue?

In my eyes it's still mixing the value of "malware.name" - once it's 'family', once 'infection', but it may also be a difference in data available in reports.

event4_microsoft_sinkhole:

*** 1,5 **** --- 1,6 ----    {       "constant_fields" : { +       "classification.identifier" : "event4-microsoft-sinkhole",          "classification.taxonomy" : "malicious-code",          "classification.type" : "infected-system"       },

*** 7,17 ****       "file_name" : "event4_microsoft_sinkhole",       "optional_fields" : [          [ -          "classification.identifier", -          "infection", -          "validate_to_none" -       ], -       [             "malware.name",             "family",             "validate_to_none" --- 8,13 ----

event4_microsoft_sinkhole_http:

*** 1,5 **** --- 1,6 ----    {       "constant_fields" : { +       "classification.identifier" : "event4-microsoft-sinkhole-http",          "classification.taxonomy" : "malicious-code",          "classification.type" : "infected-system",          "protocol.application" : "http"

*** 8,17 ****       "file_name" : "event4_microsoft_sinkhole_http",       "optional_fields" : [          [ -          "classification.identifier", -          "tag" -       ], -       [             "malware.name",             "family",             "validate_to_none" --- 9,14 ----

event6_sinkhole:

*** 1,5 **** --- 1,6 ----    {       "constant_fields" : { +       "classification.identifier" : "event6-sinkhole",          "classification.taxonomy" : "malicious-code",          "classification.type" : "infected-system"       },

*** 7,17 ****       "file_name" : "event6_sinkhole",       "optional_fields" : [          [ -          "classification.identifier", -          "infection", -          "validate_to_none" -       ], -       [             "malware.name",             "family",             "validate_to_none" --- 8,13 ----

event6_sinkhole_http:

*** 1,5 **** --- 1,6 ----    {       "constant_fields" : { +       "classification.identifier" : "event6-sinkhole-http",          "classification.taxonomy" : "malicious-code",          "classification.type" : "infected-system",          "protocol.application" : "http"

*** 8,17 ****       "file_name" : "event6_sinkhole_http",       "optional_fields" : [          [ -          "classification.identifier", -          "tag" -       ], -       [             "malware.name",             "family",             "validate_to_none" --- 9,14 ----

event6_sinkhole_http_referer:

*** 1,8 ****    {       "constant_fields" : {          "classification.identifier" : "event6-sinkhole-http-referer", !       "classification.taxonomy" : "other", !       "classification.type" : "other"       },       "feed_name" : "Sinkhole-Events-HTTP-Referer IPv6",       "file_name" : "event6_sinkhole_http_referer", --- 1,8 ----    {       "constant_fields" : {          "classification.identifier" : "event6-sinkhole-http-referer", !       "classification.taxonomy" : "malicious-code", !       "classification.type" : "infected-system"       },       "feed_name" : "Sinkhole-Events-HTTP-Referer IPv6",       "file_name" : "event6_sinkhole_http_referer",

event_honeypot_brute_force:

*** 1,5 **** --- 1,6 ----    {       "constant_fields" : { +       "classification.identifier" : "honeypot-brute-force",          "classification.taxonomy" : "intrusion-attempts",          "classification.type" : "brute-force"       },

*** 7,16 ****       "file_name" : "event4_honeypot_brute_force",       "optional_fields" : [          [ -          "classification.identifier", -          "application" -       ], -       [             "destination.account",             "username",             "validate_to_none" --- 8,13 ----

event_honeypot_darknet:

*** 1,5 **** --- 1,6 ----    {       "constant_fields" : { +       "classification.identifier" : "honeypot-darknet",          "classification.taxonomy" : "other",          "classification.type" : "other"       },

*** 7,17 ****       "file_name" : "event4_honeypot_darknet",       "optional_fields" : [          [ -          "classification.identifier", -          "tag", -          "validate_to_none" -       ], -       [             "malware.name",             "infection",             "validate_to_none" --- 8,13 ----

event_sinkhole:

*** 1,5 **** --- 1,6 ----    {       "constant_fields" : { +       "classification.identifier" : "sinkhole",          "classification.taxonomy" : "malicious-code",          "classification.type" : "infected-system"       },

*** 7,17 ****       "file_name" : "event4_sinkhole",       "optional_fields" : [          [ -          "classification.identifier", -          "infection", -          "validate_to_none" -       ], -       [             "malware.name",             "family",             "validate_to_none" --- 8,13 ----

event_sinkhole_http:

*** 1,5 **** --- 1,6 ----    {       "constant_fields" : { +       "classification.identifier" : "sinkhole-http",          "classification.taxonomy" : "malicious-code",          "classification.type" : "infected-system",          "protocol.application" : "http"

*** 8,17 ****       "file_name" : "event4_sinkhole_http",       "optional_fields" : [          [ -          "classification.identifier", -          "tag" -       ], -       [             "malware.name",             "family",             "validate_to_none" --- 9,14 ----

event_sinkhole_http_referer:

*** 1,8 ****    {       "constant_fields" : {          "classification.identifier" : "sinkhole-http-referer", !       "classification.taxonomy" : "other", !       "classification.type" : "other"       },       "feed_name" : "Sinkhole-Events-HTTP-Referer IPv4",       "file_name" : "event4_sinkhole_http_referer", --- 1,8 ----    {       "constant_fields" : {          "classification.identifier" : "sinkhole-http-referer", !       "classification.taxonomy" : "malicious-code", !       "classification.type" : "infected-system"       },       "feed_name" : "Sinkhole-Events-HTTP-Referer IPv4",       "file_name" : "event4_sinkhole_http_referer",

Best regards

// Kamil Mańkowski mankowski@cert.at - T: +43 676 898 298 7204 // CERT Austria - https://www.cert.at/ // CERT.at GmbH, FB-Nr. 561772k, HG Wien

On 1/24/24 13:02, Kamil Mankowski wrote:

...
Thanks, I'm forwarding it to the ShadowServer for the corrections

Best regards

// Kamil Mańkowski mankowski@cert.at - T: +43 676 898 298 7204 // CERT Austria - https://www.cert.at/ // CERT.at GmbH, FB-Nr. 561772k, HG Wien

On 1/24/24 12:15, Thomas Hungenberg wrote:

...
Hi Kamil,

I had a quick look at the mapping. Unfortunately, it is not correct.

The following changes should be applied to the mapping for ALL sinkhole related feeds:

==================================================         "constant_fields" : {            "classification.taxonomy" : "malicious-code",            "classification.type" : "infected-system" +         "classification.identifier" : "example: event4-sinkhole", # set CI to feed name (with dashes) like with other feeds         },

"optional_fields" : [            [ -            "classification.identifier", +            "malware.name",               "infection",               "validate_to_none"            ],

[ -            "malware.name", +            "extra.",               "family",               "validate_to_none"            ],

-         [ -            "extra.", -            "infection", -            "validate_to_none"

-         ],

I also noticed that classification.taxonomy and classification.type are set to "other" for some sinkhole feeds like this:

"event_sinkhole_http_referer" : {        "constant_fields" : {           "classification.identifier" : "event-sinkhole-http-referer",           "classification.taxonomy" : "other",           "classification.type" : "other"

This should be changed to:

"classification.taxonomy" : "malicious-code",           "classification.type" : "infected-system",

Kind regards Thomas

On 24.01.24 11:23, Kamil Mankowski via IntelMQ-dev wrote:

...
Hi,

thanks for the patch, could you please have a look if it is correct in the incoming ShadowServer parser mapping? https://github.com/The-Shadowserver-Foundation/report_schema/blob/main/intel...

I'm pretty sure I was working with them to clean up such discrepancies, but we may have missed something. I don't want the next release to revert your changes unintentionally.

Best regards

// Kamil Mańkowski mankowski@cert.at - T: +43 676 898 298 7204 // CERT Austria - https://www.cert.at/ // CERT.at GmbH, FB-Nr. 561772k, HG Wien

On 1/24/24 10:28, Thomas Hungenberg via IntelMQ-dev wrote:

...
Hi all,

the parsers for malware events provided by different sources usually store the malware name in malware.name and classification.identifier is left blank (or set to the feed's name). When using the malware name mapping, a harmonized malware name is subsequently written to classification.identifier. So finally you have the original name in malware.name and the harmonized name in classification.identifier.

Formerly (in the version we initially provided), the Shadowserver parser also stored the malware name in malware.name, see e.g. https://github.com/certtools/intelmq/blob/c61ff2fd4232d6937f3815377b75f682a6fcf790/intelmq/bots/parsers/shadowserver/_config.py line 387

However, for some time the Shadowserver parser now writes the malware name ("infection") to classification.identifier and "family" to malware.name instead. This is bad for several reasons:

it is not consistent with parsers for other malware feeds

it breaks deduplicators matching on malware.name

the malware name mapping overwrites classification.identifier with the

value of "family" (which often is empty)

Here is a patch (for the version included with IntelMQ 3.2.1) to fix this problem and make malware events parsed by the Shadowserver parser consistent with other parsers again:

=============================================== diff --git a/_config.py.orig b/_config.py index bea3d0c..431bcb9 100644 --- a/_config.py.orig +++ b/_config.py @@ -867,10 +867,9 @@ event_sinkhole = {           ('source.port', 'src_port', convert_int),       ],       'optional_fields': [ -        ('classification.identifier', 'infection', validate_to_none), -        ('malware.name', 'family', validate_to_none), +        ('malware.name', 'infection', validate_to_none), +        ('extra.', 'family', validate_to_none),           ('extra.', 'tag', validate_to_none), -        ('extra.', 'infection', validate_to_none),           ('protocol.transport', 'protocol'),           ('source.asn', 'src_asn', invalidate_zero),           ('source.geolocation.cc', 'src_geo'), @@ -899,6 +898,7 @@ event_sinkhole = {       'constant_fields': {           'classification.taxonomy': 'malicious-code',           'classification.type': 'infected-system', +        'classification.identifier': 'sinkhole-events',       },   }

@@ -944,10 +944,9 @@ event_sinkhole_http = {           ('source.port', 'src_port', convert_int),       ],       'optional_fields': [ -        ('classification.identifier', 'tag'), -        ('malware.name', 'family', validate_to_none), +        ('malware.name', 'infection', validate_to_none), +        ('extra.', 'family', validate_to_none),           ('extra.', 'tag', validate_to_none), -        ('extra.', 'infection', validate_to_none),           ('protocol.transport', 'protocol'),           ('source.asn', 'src_asn', invalidate_zero),           ('source.geolocation.cc', 'src_geo'), @@ -982,6 +981,7 @@ event_sinkhole_http = {       'constant_fields': {           'classification.taxonomy': 'malicious-code',           'classification.type': 'infected-system', +        'classification.identifier': 'sinkhole-http-events',           'protocol.application': 'http',       },   } @@ -992,9 +992,9 @@ event_sinkhole_http_referer = {           ('time.source', 'timestamp', add_UTC_to_timestamp),       ],       'optional_fields': [ -        ('malware.name', 'family', validate_to_none), +        ('malware.name', 'infection', validate_to_none), +        ('extra.', 'family', validate_to_none),           ('extra.', 'tag', validate_to_none), -        ('extra.', 'infection', validate_to_none),           ('protocol.transport', 'protocol'),           ('extra.', 'http_referer_ip', validate_ip),           ('extra.', 'http_referer_port', convert_int), ===============================================

Kind regards Thomas

IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/

IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/

IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/

-- - Thomas CERT-Bund Incident Response & Malware Analysis Team

Sebix

2:30 p.m.

Dear list,

On 1/26/24 11:01, Thomas Hungenberg via IntelMQ-dev wrote:

...

I thought about this again in more detail. The classification attributes should describe the incident with getting more specific from taxonomy to identifier. So for feeds like Open-SNMP, it makes sense to set the classification.identifer to the feed's name like this:

'classification.taxonomy': 'vulnerable', 'classification.type': 'vulnerable-system', 'classification.identifier': 'open-snmp',

I agree.

...

However, for malware events my proposal of setting the classification.identifier to the feed's name does not make sense as a feedname like "event4-microsoft-sinkhole" is not a specific description of the incident itself but rather the type of source of the information.

So I think it is best to keep writing the malware name ("infection" or "tag") to classification.identifier as this is a specific description of the individual incident. However, the malware name ("infection" or "tag") needs also be stored in malware.name for the malware name mapping to work. "family" should instead be stored in extra.

Originally, the intended use of classification.identifier and malware.name was: - malware.name contained the original (and unprocessed) malware name. It was as specific as possible. It can have the malware variant. For example, "b157-rL". - The classification.* fields should be usable for aggregation, de-duplication, statistics etc. - For malware events, the parsers could write the malware family (e.g. "zeus") or the malware name to the identifier. - The family took precedence, but if not known, the more specific malware.name could be used instead. - It was always up to the user to replace the identifier with a more generic malware family, e.g. using the public malware name mapping and malpedia.

At least until 2022, IntelMQ and all its parsers fit this concept. It may still be the case, given the recent significant changes.

https://docs.intelmq.org/latest/user/event/#meaning-of-source-and-destinatio... still contains a short summary.

best regards Sebastian

-- Institute for Common Good Technology gemeinnütziger Kulturverein - nonprofit cultural society https://commongoodtechnology.org/ ZVR 1510673578

Thomas Hungenberg

29 Jan 29 Jan

8:49 a.m.

Hi all,

On 26.01.24 15:30, Sebix wrote:

...

Originally, the intended use of classification.identifier and malware.name was:

malware.name contained the original (and unprocessed) malware name. It was as specific as possible. It can have the malware variant. For example,

"b157-rL".

The classification.* fields should be usable for aggregation, de-duplication, statistics etc.

For malware events, the parsers could write the malware family (e.g. "zeus") or the malware name to the identifier.

The family took precedence, but if not known, the more specific malware.name could be used instead.

It was always up to the user to replace the identifier with a more generic malware family, e.g. using the public malware name mapping and malpedia.

At least until 2022, IntelMQ and all its parsers fit this concept. It may still be the case, given the recent significant changes.

@Sebastian: Thanks for summarizing this well-proven concept!

The changes in the Shadowserver parser config must have happened somewhen between January and August 2022. Most likely with the adoption to the changes in the Shadowserver feeds like the move from "botnet drone" to "sinkhole events"?

In Januar 2022, the original (unprocessed) malware name ("infection" or "type") was still written to malware.name and "family" to extra. classification.identifier was left blank and could be set e.g. with a malware name mapping modify expert:

============================== drone = { 'optional_fields': [ ('malware.name', 'infection'), ('extra.', 'family', validate_to_none), ], 'constant_fields': { # classification.identifier will be set to (harmonized) malware name by modify expert }, ==============================

See https://github.com/certtools/intelmq/blob/747100f6ee6519a44cd157fe0b6c98f4b3585821/intelmq/bots/parsers/shadowserver/_config.py

This fits the concept mentioned above.

However, in August 2022 "infection" was no longer stored in malware.name but used as classification.identifier and malware.name was set to "family":

============================== event_sinkhole = { 'optional_fields': [ ('classification.identifier', 'infection', validate_to_none), ('malware.name', 'family', validate_to_none), ==============================

See https://github.com/certtools/intelmq/blob/1e4a16c5594e88461f2eccad87d2ea3b62e7c955/intelmq/bots/parsers/shadowserver/_config.py

Unfortunately, this is the opposite of the well-proven concept.

With the changes I proposed last week (2024-01-26), we return to the former well-proven concept with storing "infection" (or "type") in malware.name and "family" in "extra.family" like until 2022. This makes the Shadowserver parser consistent with other parsers for malware events (like ctip or anubis) again.

Additionally, we store "infection" (or "type") in classification.identifier as well to make sure every event processed by the parser has a classification.identifier. However, the classification.identifier can later be replaced e.g. with a harmonized malware name using the malware name mapping.

Kind regards Thomas

Kamil Mankowski

30 Jan 30 Jan

8:10 a.m.

Hi all,

Thanks for the comments. I've forwarded the thread to ShadowServer, and they also have just joined the list (represented by @elsif, who works on the IntelMQ integration), so we can discuss the feedback directly.

@Thomas - answering the question about completed schema changes, I spoke with elsif about that a few weeks ago, and schema changelog is available at https://github.com/The-Shadowserver-Foundation/report_schema/blob/main/compl...

Best regards

// Kamil Mańkowski mankowski@cert.at - T: +43 676 898 298 7204 // CERT Austria - https://www.cert.at/ // CERT.at GmbH, FB-Nr. 561772k, HG Wien

On 1/29/24 09:49, Thomas Hungenberg wrote:

...

Hi all,

On 26.01.24 15:30, Sebix wrote:

...
Originally, the intended use of classification.identifier and malware.name was:

malware.name contained the original (and unprocessed) malware name.

It was as specific as possible. It can have the malware variant. For example, "b157-rL".

The classification.* fields should be usable for aggregation,

de-duplication, statistics etc.

For malware events, the parsers could write the malware family (e.g.

"zeus") or the malware name to the identifier.

The family took precedence, but if not known, the more specific

malware.name could be used instead.

It was always up to the user to replace the identifier with a more

generic malware family, e.g. using the public malware name mapping and malpedia.

At least until 2022, IntelMQ and all its parsers fit this concept. It may still be the case, given the recent significant changes.

@Sebastian: Thanks for summarizing this well-proven concept!

The changes in the Shadowserver parser config must have happened somewhen between January and August 2022. Most likely with the adoption to the changes in the Shadowserver feeds like the move from "botnet drone" to "sinkhole events"?

In Januar 2022, the original (unprocessed) malware name ("infection" or "type") was still written to malware.name and "family" to extra. classification.identifier was left blank and could be set e.g. with a malware name mapping modify expert:

============================== drone = { 'optional_fields': [ ('malware.name', 'infection'), ('extra.', 'family', validate_to_none), ], 'constant_fields': { # classification.identifier will be set to (harmonized) malware name by modify expert }, ==============================

See https://github.com/certtools/intelmq/blob/747100f6ee6519a44cd157fe0b6c98f4b3585821/intelmq/bots/parsers/shadowserver/_config.py

This fits the concept mentioned above.

However, in August 2022 "infection" was no longer stored in malware.name but used as classification.identifier and malware.name was set to "family":

============================== event_sinkhole = { 'optional_fields': [ ('classification.identifier', 'infection', validate_to_none), ('malware.name', 'family', validate_to_none), ==============================

See https://github.com/certtools/intelmq/blob/1e4a16c5594e88461f2eccad87d2ea3b62e7c955/intelmq/bots/parsers/shadowserver/_config.py

Unfortunately, this is the opposite of the well-proven concept.

With the changes I proposed last week (2024-01-26), we return to the former well-proven concept with storing "infection" (or "type") in malware.name and "family" in "extra.family" like until 2022. This makes the Shadowserver parser consistent with other parsers for malware events (like ctip or anubis) again.

Additionally, we store "infection" (or "type") in classification.identifier as well to make sure every event processed by the parser has a classification.identifier. However, the classification.identifier can later be replaced e.g. with a harmonized malware name using the malware name mapping.

Kind regards Thomas

elsif

3:38 p.m.

Hello,

The schema has been updated based on your feedback:

* The 'malware.name' is now mapped to 'infection' for the event4_microsoft_sinkhole, event4_microsoft_sinkhole_http, event6_sinkhole, event6_sinkhole_http, event6_sinkhole_http_referer, event_sinkhole, event_sinkole_dns, event_sinkhole_http, and event_sinkhole_http_referer reports. * The 'classification.identifier' is now mapped to 'infection' for the event4_microsoft_sinkhole_http, event6_sinkhole_http, event6_sinkhole_http_referer, event_sinkhole_http, and event_sinkhole_http_referer reports. * The 'classification.taxonomy', 'classification.type', and 'protocol.application' were changed for the event6_sinkhole_http_referer and event_sinkhole_http_referer reports.

Regards

On 1/30/24 12:10 AM, Kamil Mankowski via IntelMQ-dev wrote:

...

Hi all,

Thanks for the comments. I've forwarded the thread to ShadowServer, and they also have just joined the list (represented by @elsif, who works on the IntelMQ integration), so we can discuss the feedback directly.

@Thomas - answering the question about completed schema changes, I spoke with elsif about that a few weeks ago, and schema changelog is available at https://github.com/The-Shadowserver-Foundation/report_schema/blob/main/compl...

Best regards

// Kamil Mańkowski mankowski@cert.at - T: +43 676 898 298 7204 // CERT Austria - https://www.cert.at/ // CERT.at GmbH, FB-Nr. 561772k, HG Wien

Thomas Hungenberg

31 Jan 31 Jan

12:37 p.m.

On 30.01.24 16:38, elsif wrote:

...

The schema has been updated based on your feedback:

Excellent, thanks very much!

Kind regards Thomas

664

Age (days ago)

671

Last active (days ago)

intelmq-dev@lists.cert.at

11 comments

4 participants

tags (0)

participants (4)

elsif
Kamil Mankowski
Sebix
Thomas Hungenberg