Dear IntelMQ Developers and Users,
an evaluation of current challenges with the internal data format led to the idea of allowing multiple values for one field in IntelMQ 3.0 (scheduled for June 2021)[0]. The idea is described below, including various advantages and disadvantages. We appreciate your input, opinion and analysis of further implications on this idea. We plan to evaluate the feedback that emerged in two weeks.
[0] https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectur... "The new IDF shall support (sorted) lists of IPs, domains, taxonomy categories, etc. By convention the most relevant item in such a list MUST be the first item in the sorted list." https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectur...]: n6-system "Since the new IDF shall support multiple values, mapping to n6 should be rather easy."
## Use-cases ### Network information IntelMQ's format currently allows for *exactly one* value per field. For example, every event can have *one* `source.ip` and *one* `source.fqdn`. In some use-cases, multiple values can be useful, for example when querying DNS information. One domain (`source.fqdn`) can point to multiple IP addresses (`source.ip`). The other way round, multiple domains point to the same IP address is also very common. The use-case first appeared was that one IP address can be part of multiple Autonomous systems (`source.asn`).[1][2][3]
See the examples below in section Format.
[1]: "Multiple ASNs/networks per IP? #543" https://github.com/certtools/intelmq/issues/543 [2]: "BOT: DNS lookup #373" https://github.com/certtools/intelmq/issues/373 [3]: "reverse DNS: Only first record is used "https://github.com/certtools/intelmq/issues/877
### Classification Another use-case is to use multiple classifications.[5] For example, if a website was hacked and used for a phishing page, it can be assigned two classifications: For the hacking: Taxonomy: information-content-security, type: unauthorised-modification-of-information For the phishing page: Taxonomy: fraud, type: phishing
Another example are reachable networks services, which should not be accessible by the internet. Shadowserver provides a lot of this data. Open XDMCP instances are both DDoS amplifiers and Potentially unwanted accessible systems. Therefore both classifications apply: Taxonomy: vulnerable, type: ddos-amplifier Taxonomy: vulnerable, type: potentially-unwanted-accessible-system
A list of all fields on the RSIT can be found in the RSIT repository[6]
[5]: https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/b... [6]: https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/b... ## Format Some examples: {"source.ip": ["192.0.43.8"], "source.asn": [16876, 40528]} {"source.ip": ["10.0.0.1", "10.0.0.2"], "source.url": ["http://example.com/", "http://example.net%22%5D%7D {"classification.taxonomy": ["information-content-security", "fraud"], "classification.type": ["unauthorised-modification-of-information", "phishing"], "source.url": ["http://example.com/"], "source.ip": ["10.0.0.1", "10.0.0.2"]}
In the bots' code multiple values need to be taken car of. For example, instead of:
ip_addr = event["source.ip"] # do stuff
it is necessary to loop over the values:
for ip_addr in event["source.ip"]: # do stuff
This logic is required for *all* fields which can have multiple values, therefore nested loops may be necessary.
Everything which processes IntelMQ data needs to be adapted, including data bases. See the "Disadvantages" section below.
### Optional back-conversion ("value-explosion")
One variant/option of this IEP is to create a conversion layer from the new multi-value format to the old one-value format by creating multiple events with only one value per field. Using this conversion, compatibility with external components can be kept, while the advantages only exist inside the IntelMQ core (ie. the bots).
Examples: {"source.ip": ["127.0.0.1", "127.0.0.2"], "source.fqdn": ["example.com"]} -> {"source.ip": "127.0.0.1", "source.fqdn": ["example.com"]}, {"source.ip": "127.0.0.2", "source.fqdn": ["example.com"]} {"source.ip": ["127.0.0.1", "127.0.0.2"], "source.fqdn": ["example.com", "example.org"]} -> {"source.ip": "127.0.0.1", "source.fqdn": "example.com"}, {"source.ip": "127.0.0.1", "source.fqdn": "example.org"}, {"source.ip": "127.0.0.2", "source.fqdn": "example.com"}, {"source.ip": "127.0.0.2", "source.fqdn": "example.org"}
### What will change? We'll change the behaviour of the current IntelMQ internal parsing process, i. e. you'll be able to add multiple IP addresses to on field, which will be handled as multiple events, but merged into one event. This will allow us to combine i. e. a domain with multiple IP addresses to one event.
### Advantages
Supporting multiple values allows us to add multiple IP addresses to one event. As opposed to using multiple events with nearly similar data, the multiple-value approach reduces data duplication and has less overhead, while on the other hand the complexity increases. If multiple events would be used instead, related events would need to be linked together by other means (see section Alternative below).
### Disadvantages (breaking behaviour)
The complexity in IntelMQ and all linked components increases without doubt. All components dealing with the IntelMQ-data need to be adapted to deal with multiple values. This includes all bots, but IntelMQ administrators need to adapt their configurations (e.g. filters, etc.) as well.
Without the explosion-variant, all connected databases need to be adapted (e.g. PostgreSQL, SQLite, Elastic, MongoDB etc.) additionally and all software which is processing data from IntelMQ need to be adapted. PostgreSQL support arrays for columns, but the scheme conversion can be complex and resource-hungry.
IntelMQ followed the KISS ("keep it simple, stupid")[4] principle from its beginning. It is disputable if multiple values breaks with this principle.
[4]: https://en.wikipedia.org/wiki/KISS_principle ## Alternatives
An alternative to using multiple values per field is to set unique identifiers (e.g. UUID) per event and let events with the same origin have the same "parent" identifier. This way, related events can be linked and compatibility is easier. Relating the events to each other requires extra steps although, but keeps the KISS principle. This approach will be described in IEP04.
To solve the use-case of multiple classifications per event, the primary and most important classification can be used instead of multiple ones.
A possible solution for the classification use-case above would be to some sort of tagging - in short "tags". I. e. { "source.ip": ["192.0.43.8"], "source.asn": [16876, 40528], "tags": ["ddos-amplifier", "info-disclosure", "mirai-botnet"] }
## Other IoC processing formats
For reference, we describe the formats of other IoC-processing systems similar to IntelMQ. Both formats, IDEA and n6 do support multiple values in different kinds. If you know of other similar formats supporting multiple values, please speak up!
### "IDEA"
The IDEA-format, used by CESNET-developed Warden, supports multiple values for some fields. But the data format structure differs clearly from IntelMQ's, as you can see in the example below. The classification is defined per address and network ranges are possible as addresses, what is not supported in IntelMQ. IDEA was designed from scratch to overcome disadvantages of Warden's previous data format.
Example: "Source": [ { "Type": ["Phishing"], "IP4": ["192.168.0.2-192.168.0.5", "192.168.0.10/25"], "IP6": ["2001:0db8:0000:0000:0000:ff00:0042::/112"], "Hostname": ["example.com"], "URL": ["http://example.com/cgi-bin/killemall"], "Proto": ["tcp", "http"], "AttachHand": ["att1"], "Netname": ["ripe:IANA-CBLK-RESERVED1"] } ], "Target": [ { "Type": ["Backscatter", "OriginSpam"], "Email": ["innocent@example.com"], "Spoofed": true }, { "IP4": ["10.2.2.0/24"], "Anonymised": true } ]
Upstream documentation: https://idea.cesnet.cz/en/index https://warden.cesnet.cz/en/index
### n6
In the n6 format, the addr field is a list of arrays with `ip`, `asn`, `cc` and `dir` fields. `addr` is similar to IntelMQ's `source` namespace, but the size of `addr` is much lower and the "direction" of the address is given by a field inside the addr item.
Example: [{"ipv6": "abcd::1", "cc": "PL", "asn": 12345, "dir": "dst"}]
Upstream documentation: https://n6sdk.readthedocs.io/en/latest/tutorial.html#field-class-addressfiel... https://n6sdk.readthedocs.io/en/latest/tutorial.html#field-class-extendedadd...
Hi,
Am Dienstag 30 März 2021 17:53:47 schrieb Sebastian Waldbauer:
We plan to evaluate the feedback that emerged in two weeks.
thanks for writing IEPs and asking for feedback on the lists.
My suggestions: * As the text gets very detailed, it would profit from a formated version on the web. And the email could have the link and the markup or just the link (whatever people prefer) * When pinging two lists, you should point one list where the replies and the discussion should go, otherwise we get dubplicated mails for people in both lists.
Regards, Bernhard
Hi,
On 3/31/21 1:31 PM, Bernhard Reiter wrote:
My suggestions:
- As the text gets very detailed, it would profit from a formated version on the web. And the email could have the link and the markup or just the link (whatever people prefer)
Good idea, we can do so next time. I've also create an issue here to not forget about doing this for the previous IEPs: https://github.com/certtools/intelmq/issues/1839
- When pinging two lists, you should point one list where the replies and the discussion should go, otherwise we get dubplicated mails for people in both lists.
Thanks, that's a good idea. As both lists don't have the same subscribers (there's some overlap though) and we'd like to hear the feedback from the full community, as the change affects both developers and users, we sent it to both lists. Sending the IEPs only to dev and a short heads-up to users, with a Reply-To header set, would reduce the chaos a bit.
Sebastian
On 4/1/21 11:52 AM, Sebastian Wagner wrote:
On 3/31/21 1:31 PM, Bernhard Reiter wrote:
My suggestions:
- As the text gets very detailed, it would profit from a formated version on the web. And the email could have the link and the markup or just the link (whatever people prefer)
Good idea, we can do so next time. I've also create an issue here to not forget about doing this for the previous IEPs: https://github.com/certtools/intelmq/issues/1839
The suggestion is now implemented at https://github.com/certtools/ieps
Sebastian
Hello,
again a few notes based on Idea experience. :)
From: Sebastian Waldbauer waldbauer@cert.at, Date: bře 30, 2021
## Use-cases ### Network information IntelMQ's format currently allows for *exactly one* value per field. For example, every event can have *one* `source.ip` and *one* `source.fqdn`. In some use-cases, multiple values can be useful, for example when querying DNS information. One domain (`source.fqdn`) can point to multiple IP addresses (`source.ip`). The other way round, multiple domains point to the same IP address is also very common. The use-case first appeared was that one IP address can be part of multiple Autonomous systems (`source.asn`).[1][2][3]
Do all source.fgdn have to correspond with source.ip and source.asn?
Consider:
source.ip: [78.128.216.141, 2001:718:ff05:202::141, 78.128.211.46, 2001:718:1:1f:50:56ff:feee:46] source.fqdn: [idea.cesnet.cz, www.cesnet.cz, cesnet.cz]
Relation of which IPs correspond to which FQDNs is lost here.
If it's not to be lost, you need another level of nesting/indirection - or you can _require_ for all fields to correspond, split events accordingly where it's not the case and implement both this and also variation of IEP04 (where you may face cartesian explosion problem I mentioned in reaction there). Like something akin to:
Event 1 meta.uuid.current: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa source.ip: [78.128.216.141, 2001:718:ff05:202::141] source.fqdn: [idea.cesnet.cz]
Event 2 meta.uuid.current: bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb meta.uuid.parent: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa source.ip: [78.128.211.46, 2001:718:1:1f:50:56ff:feee:46] source.fqdn: [www.cesnet.cz, cesnet.cz]
### Classification
...
## Format {"classification.taxonomy": ["information-content-security", "fraud"], "classification.type": ["unauthorised-modification-of-information", "phishing"]
I believe (feel free to correct me) that RSIT does not preclude usage of just first level category in cases where second level is ambiguous or unknown, so in two array format you could solve it for example like:
{ "classification.taxonomy": ["information-content-security", "fraud"], "classification.type": [null, "phishing"]
In Idea we went for "merged" field, here it might look like:
classification: [ "information-content-security.unauthorised-modification-of-information", "fraud.phishing" ]
or considering missing second level:
classification: [ "information-content-security", "fraud.phishing" ]
### Optional back-conversion ("value-explosion")
One variant/option of this IEP is to create a conversion layer from the new multi-value format to the old one-value format by creating multiple events with only one value per field. Using this conversion, compatibility with external components can be kept, while the advantages only exist inside the IntelMQ core (ie. the bots).
Examples: {"source.ip": ["127.0.0.1", "127.0.0.2"], "source.fqdn": ["example.com"]} -> {"source.ip": "127.0.0.1", "source.fqdn": ["example.com"]}, {"source.ip": "127.0.0.2", "source.fqdn": ["example.com"]} {"source.ip": ["127.0.0.1", "127.0.0.2"], "source.fqdn": ["example.com", "example.org"]} -> {"source.ip": "127.0.0.1", "source.fqdn": "example.com"}, {"source.ip": "127.0.0.1", "source.fqdn": "example.org"}, {"source.ip": "127.0.0.2", "source.fqdn": "example.com"}, {"source.ip": "127.0.0.2", "source.fqdn": "example.org"}
Ah, here goes cartesian. :) In theory this could work. In reality - don't do that. We tried. Soon somebody starts to use multiple values for scans and DDoSes, and you really do not want to grind the processing machine to the halt when creating specific event for 200 source ips times 150 fqdns times 400 target ips times 350 fqdns times 50 asns, times ... This goes to too big numbers too fast.
IntelMQ followed the KISS ("keep it simple, stupid")[4] principle from its beginning. It is disputable if multiple values breaks with this principle.
Depends on usecase - you might decide against multivalues just because majority of IntelMQ users and use-cases does not need it and does weigh over complexity increase. We had to bite the bullet, because we have a number of our own sources of data, which are inherently M:N, YMMV.
## Alternatives
An alternative to using multiple values per field is to set unique identifiers (e.g. UUID) per event and let events with the same origin have the same "parent" identifier. This way, related events can be linked and compatibility is easier. Relating the events to each other requires extra steps although, but keeps the KISS principle. This approach will be described in IEP04.
Complications are even here - how long should reader wait for possible child events? How does it know it has a complete set, before processing it and/or sending it forward?
## Other IoC processing formats
For reference, we describe the formats of other IoC-processing systems similar to IntelMQ. Both formats, IDEA and n6 do support multiple values in different kinds. If you know of other similar formats supporting multiple values, please speak up!
As Idea is loosely based on IDMEF, I've been contacted by Prelude SIEM guys, who are trying to do similar things at: https://www.secef.net/ Haven't had time to review their work though.
Cheers -- Pavel Kácha, CESNET
Hey List & Hey Pavel!
Thanks for your awesome feedback on the IEP03 :)
On 3/31/21 5:27 PM, Pavel Kácha wrote:
Hello,
again a few notes based on Idea experience. :)
From: Sebastian Waldbauer waldbauer@cert.at, Date: bře 30, 2021
## Use-cases ### Network information IntelMQ's format currently allows for *exactly one* value per field. For example, every event can have *one* `source.ip` and *one* `source.fqdn`. In some use-cases, multiple values can be useful, for example when querying DNS information. One domain (`source.fqdn`) can point to multiple IP addresses (`source.ip`). The other way round, multiple domains point to the same
IP
address is also very common. The use-case first appeared was that one IP address can be part of multiple Autonomous systems (`source.asn`).[1][2][3]
Do all source.fgdn have to correspond with source.ip and source.asn? Consider: source.ip: [78.128.216.141, 2001:718:ff05:202::141, 78.128.211.46, 2001:718:1:1f:50:56ff:feee:46] source.fqdn: [idea.cesnet.cz, www.cesnet.cz, cesnet.cz] Relation of which IPs correspond to which FQDNs is lost here. If it's not to be lost, you need another level of nesting/indirection -
or you can _require_ for all fields to correspond, split events accordingly where it's not the case and implement both this and also variation of IEP04 (where you may face cartesian explosion problem I mentioned in reaction there). Like something akin to:
Event 1 meta.uuid.current: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa source.ip: [78.128.216.141, 2001:718:ff05:202::141] source.fqdn: [idea.cesnet.cz]
Event 2 meta.uuid.current: bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb meta.uuid.parent: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa source.ip: [78.128.211.46, 2001:718:1:1f:50:56ff:feee:46] source.fqdn: [www.cesnet.cz, cesnet.cz]
IMHO, I wouldn't use multiple values in source fields as intelmq data gets more complex and will break the KISS principle. Using the parent uuid would solve this problem of nesting, as you can use uuids to connect similar events to each other. I'd propose to use multiple values in fields where it doenst get too complex like `tags`. Tags can be used to add specified tags like campaigns.
### Classification
...
## Format {"classification.taxonomy": ["information-content-security", "fraud"], "classification.type": ["unauthorised-modification-of-information", "phishing"]
I believe (feel free to correct me) that RSIT does not preclude usage of
just first level category in cases where second level is ambiguous or unknown, so in two array format you could solve it for example like:
{ "classification.taxonomy": ["information-content-security", "fraud"], "classification.type": [null, "phishing"] In Idea we went for "merged" field, here it might look like: classification: [ "information-content-security.unauthorised-modification-of-information", "fraud.phishing" ] or considering missing second level: classification: [ "information-content-security", "fraud.phishing" ]
Agree with the "tagging" style, as it can contain a lot of information & can be set per event.
I'd suggest to discuss this in an hackathon, so there might be more input & use-case on that one.
Hello Sebastian,
From: Sebastian Waldbauer waldbauer@cert.at, Date: dub 06, 2021
IMHO, I wouldn't use multiple values in source fields as intelmq data gets more complex and will break the KISS principle. Using the parent uuid would solve this problem of nesting, as you can use uuids to connect similar events to each other. I'd propose to use multiple values in fields where it doenst get too complex like `tags`. Tags can be used to add specified tags like campaigns.
Sure, then you'll have to embrace IEP04, however with its own set of problems (M:N alias difficult to describe DDoS, data completeness problem alias "how long should I wait to be reasonably sure I have complete set of events?"). We wanted to take the analysis/assembly/complexity burden out of readers (cause we have our experience with IDMEF, IODEF and MISP, which seem to me as writer friendly, not reader friendly :) ), so we went for (hopefully reasonably) increased complexity and against dropping (too much) features - and trying to solve most of the problems on our side. Real world is usually not KISS. :)
### Classification
...
## Format {"classification.taxonomy": ["information-content-security", "fraud"], "classification.type": ["unauthorised-modification-of-information", "phishing"]
I believe (feel free to correct me) that RSIT does not preclude usage of
just first level category in cases where second level is ambiguous or unknown, so in two array format you could solve it for example like:
{ "classification.taxonomy": ["information-content-security", "fraud"], "classification.type": [null, "phishing"] In Idea we went for "merged" field, here it might look like: classification: [ "information-content-security.unauthorised-modification-of-information", "fraud.phishing" ] or considering missing second level: classification: [ "information-content-security", "fraud.phishing" ]
Agree with the "tagging" style, as it can contain a lot of information & can be set per event.
Just a note that came to my mind - on RSIT, second level implies first level (all second level labels are unique and belong to exactly one first level labels), so (as for completeness of information, not necessarily for clarity) "unauthorised-modification-of-information" is in fact enough, instead of "information-content-security.unauthorised-modification-of-information". However, explicit is usually better than implicit. :)
-- Pavel
Dear community,
In today's hackathon we discussed IEP03 in detail.
As described in the original proposal, IEP03 was based on the IntelMQ 3.0 architecture document[0]. The discussion we just had showed, that there are definitely use-cases which can be enhanced by such a data format change and IntelMQ can involve in such a direction in the future. It was also pointed out, that the change does not necessarily break KISS, as the implementation should be just as complex as it needs to be to solve the problem but no more complex. However, the known use-cases as of IntelMQ 3.0 are not enough to implement this major change at this stage and for IntelMQ 3.0, given the big negative impact. Other use-cases which support such a feature are not yet known well enough in detail and need to be collected and described on a larger scale first, with a vision for IntelMQ 4.0 in mind. The IntelMQ Architecture Board, which is being started now, will support this process. IEP04 will be adapted to incorporate the use-cases covered by IEP03.
Thanks again to everyone for your valuable input and your engagement to bring IntelMQ forward!
best regards Sebastian
[0] https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectur...
On 3/30/21 5:53 PM, Sebastian Waldbauer wrote:
Dear IntelMQ Developers and Users,
an evaluation of current challenges with the internal data format led to the idea of allowing multiple values for one field in IntelMQ 3.0 (scheduled for June 2021)[0]. The idea is described below, including various advantages and disadvantages. We appreciate your input, opinion and analysis of further implications on this idea. We plan to evaluate the feedback that emerged in two weeks.
[0] https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectur... "The new IDF shall support (sorted) lists of IPs, domains, taxonomy categories, etc. By convention the most relevant item in such a list MUST be the first item in the sorted list." https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectur...]: n6-system "Since the new IDF shall support multiple values, mapping to n6 should be rather easy."
## Use-cases ### Network information IntelMQ's format currently allows for *exactly one* value per field. For example, every event can have *one* `source.ip` and *one* `source.fqdn`. In some use-cases, multiple values can be useful, for example when querying DNS information. One domain (`source.fqdn`) can point to multiple IP addresses (`source.ip`). The other way round, multiple domains point to the same IP address is also very common. The use-case first appeared was that one IP address can be part of multiple Autonomous systems (`source.asn`).[1][2][3]
See the examples below in section Format.
[1]: "Multiple ASNs/networks per IP? #543" https://github.com/certtools/intelmq/issues/543 [2]: "BOT: DNS lookup #373" https://github.com/certtools/intelmq/issues/373 [3]: "reverse DNS: Only first record is used "https://github.com/certtools/intelmq/issues/877
### Classification Another use-case is to use multiple classifications.[5] For example, if a website was hacked and used for a phishing page, it can be assigned two classifications: For the hacking: Taxonomy: information-content-security, type: unauthorised-modification-of-information For the phishing page: Taxonomy: fraud, type: phishing
Another example are reachable networks services, which should not be accessible by the internet. Shadowserver provides a lot of this data. Open XDMCP instances are both DDoS amplifiers and Potentially unwanted accessible systems. Therefore both classifications apply: Taxonomy: vulnerable, type: ddos-amplifier Taxonomy: vulnerable, type: potentially-unwanted-accessible-system
A list of all fields on the RSIT can be found in the RSIT repository[6]
## Format Some examples: {"source.ip": ["192.0.43.8"], "source.asn": [16876, 40528]} {"source.ip": ["10.0.0.1", "10.0.0.2"], "source.url": ["http://example.com/", "http://example.net%22%5D%7D {"classification.taxonomy": ["information-content-security", "fraud"], "classification.type": ["unauthorised-modification-of-information", "phishing"], "source.url": ["http://example.com/"], "source.ip": ["10.0.0.1", "10.0.0.2"]}
In the bots' code multiple values need to be taken car of. For example, instead of:
ip_addr = event["source.ip"] # do stuff
it is necessary to loop over the values:
for ip_addr in event["source.ip"]: # do stuff
This logic is required for *all* fields which can have multiple values, therefore nested loops may be necessary.
Everything which processes IntelMQ data needs to be adapted, including data bases. See the "Disadvantages" section below.
### Optional back-conversion ("value-explosion")
One variant/option of this IEP is to create a conversion layer from the new multi-value format to the old one-value format by creating multiple events with only one value per field. Using this conversion, compatibility with external components can be kept, while the advantages only exist inside the IntelMQ core (ie. the bots).
Examples: {"source.ip": ["127.0.0.1", "127.0.0.2"], "source.fqdn": ["example.com"]} -> {"source.ip": "127.0.0.1", "source.fqdn": ["example.com"]}, {"source.ip": "127.0.0.2", "source.fqdn": ["example.com"]} {"source.ip": ["127.0.0.1", "127.0.0.2"], "source.fqdn": ["example.com", "example.org"]} -> {"source.ip": "127.0.0.1", "source.fqdn": "example.com"}, {"source.ip": "127.0.0.1", "source.fqdn": "example.org"}, {"source.ip": "127.0.0.2", "source.fqdn": "example.com"}, {"source.ip": "127.0.0.2", "source.fqdn": "example.org"}
### What will change? We'll change the behaviour of the current IntelMQ internal parsing process, i. e. you'll be able to add multiple IP addresses to on field, which will be handled as multiple events, but merged into one event. This will allow us to combine i. e. a domain with multiple IP addresses to one event.
### Advantages
Supporting multiple values allows us to add multiple IP addresses to one event. As opposed to using multiple events with nearly similar data, the multiple-value approach reduces data duplication and has less overhead, while on the other hand the complexity increases. If multiple events would be used instead, related events would need to be linked together by other means (see section Alternative below).
### Disadvantages (breaking behaviour)
The complexity in IntelMQ and all linked components increases without doubt. All components dealing with the IntelMQ-data need to be adapted to deal with multiple values. This includes all bots, but IntelMQ administrators need to adapt their configurations (e.g. filters, etc.) as well.
Without the explosion-variant, all connected databases need to be adapted (e.g. PostgreSQL, SQLite, Elastic, MongoDB etc.) additionally and all software which is processing data from IntelMQ need to be adapted. PostgreSQL support arrays for columns, but the scheme conversion can be complex and resource-hungry.
IntelMQ followed the KISS ("keep it simple, stupid")[4] principle from its beginning. It is disputable if multiple values breaks with this principle.
## Alternatives
An alternative to using multiple values per field is to set unique identifiers (e.g. UUID) per event and let events with the same origin have the same "parent" identifier. This way, related events can be linked and compatibility is easier. Relating the events to each other requires extra steps although, but keeps the KISS principle. This approach will be described in IEP04.
To solve the use-case of multiple classifications per event, the primary and most important classification can be used instead of multiple ones.
A possible solution for the classification use-case above would be to some sort of tagging - in short "tags". I. e. { "source.ip": ["192.0.43.8"], "source.asn": [16876, 40528], "tags": ["ddos-amplifier", "info-disclosure", "mirai-botnet"] }
## Other IoC processing formats
For reference, we describe the formats of other IoC-processing systems similar to IntelMQ. Both formats, IDEA and n6 do support multiple values in different kinds. If you know of other similar formats supporting multiple values, please speak up!
### "IDEA"
The IDEA-format, used by CESNET-developed Warden, supports multiple values for some fields. But the data format structure differs clearly from IntelMQ's, as you can see in the example below. The classification is defined per address and network ranges are possible as addresses, what is not supported in IntelMQ. IDEA was designed from scratch to overcome disadvantages of Warden's previous data format.
Example: "Source": [ { "Type": ["Phishing"], "IP4": ["192.168.0.2-192.168.0.5", "192.168.0.10/25"], "IP6": ["2001:0db8:0000:0000:0000:ff00:0042::/112"], "Hostname": ["example.com"], "URL": ["http://example.com/cgi-bin/killemall"], "Proto": ["tcp", "http"], "AttachHand": ["att1"], "Netname": ["ripe:IANA-CBLK-RESERVED1"] } ], "Target": [ { "Type": ["Backscatter", "OriginSpam"], "Email": ["innocent@example.com"], "Spoofed": true }, { "IP4": ["10.2.2.0/24"], "Anonymised": true } ]
Upstream documentation: https://idea.cesnet.cz/en/index https://warden.cesnet.cz/en/index
### n6
In the n6 format, the addr field is a list of arrays with `ip`, `asn`, `cc` and `dir` fields. `addr` is similar to IntelMQ's `source` namespace, but the size of `addr` is much lower and the "direction" of the address is given by a field inside the addr item.
Example: [{"ipv6": "abcd::1", "cc": "PL", "asn": 12345, "dir": "dst"}]
Upstream documentation: https://n6sdk.readthedocs.io/en/latest/tutorial.html#field-class-addressfiel...
https://n6sdk.readthedocs.io/en/latest/tutorial.html#field-class-extendedadd...