[IntelMQ-users] IEP03: IntelMQ Data Format - Multiple Values

Pavel Kácha ph at cesnet.cz
Wed Mar 31 17:27:20 CEST 2021


Hello,

   again a few notes based on Idea experience. :)

> From: Sebastian Waldbauer <waldbauer at cert.at>, Date: bře 30, 2021
>
> ## Use-cases
> ### Network information
> IntelMQ's format currently allows for *exactly one* value per field. For
> example, every event can have *one* `source.ip` and *one* `source.fqdn`. In
> some use-cases, multiple values can be useful, for example when querying DNS
> information. One domain (`source.fqdn`) can point to multiple IP addresses
> (`source.ip`). The other way round, multiple domains point to the same IP
> address is also very common. The use-case first appeared was that one IP
> address can be part of multiple Autonomous systems (`source.asn`).[1][2][3]

   Do all source.fgdn have to correspond with source.ip and source.asn?

   Consider:

   source.ip: [78.128.216.141, 2001:718:ff05:202::141, 78.128.211.46, 2001:718:1:1f:50:56ff:feee:46]
   source.fqdn: [idea.cesnet.cz, www.cesnet.cz, cesnet.cz]

   Relation of which IPs correspond to which FQDNs is lost here.

   If it's not to be lost, you need another level of nesting/indirection -
or you can _require_ for all fields to correspond, split events accordingly
where it's not the case and implement both this and also variation of IEP04
(where you may face cartesian explosion problem I mentioned in reaction
there). Like something akin to:

Event 1
   meta.uuid.current: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
   source.ip: [78.128.216.141, 2001:718:ff05:202::141]
   source.fqdn: [idea.cesnet.cz]

Event 2
   meta.uuid.current: bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb
   meta.uuid.parent: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
   source.ip: [78.128.211.46, 2001:718:1:1f:50:56ff:feee:46]
   source.fqdn: [www.cesnet.cz, cesnet.cz]

> ### Classification
...
> ## Format
> {"classification.taxonomy": ["information-content-security", "fraud"],
> "classification.type": ["unauthorised-modification-of-information",
> "phishing"]

   I believe (feel free to correct me) that RSIT does not preclude usage of
just first level category in cases where second level is ambiguous or
unknown, so in two array format you could solve it for example like:

   {
      "classification.taxonomy": ["information-content-security", "fraud"],
      "classification.type": [null, "phishing"]

   In Idea we went for "merged" field, here it might look like:

   classification: [
      "information-content-security.unauthorised-modification-of-information",
      "fraud.phishing"
   ]

   or considering missing second level:

   classification: [
      "information-content-security",
      "fraud.phishing"
   ]

> ### Optional back-conversion ("value-explosion")
> 
> One variant/option of this IEP is to create a conversion layer from the new
> multi-value format to the old one-value format by creating multiple events
> with only one value per field. Using this conversion, compatibility with
> external components can be kept, while the advantages only exist inside the
> IntelMQ core (ie. the bots).
> 
> Examples:
> {"source.ip": ["127.0.0.1", "127.0.0.2"], "source.fqdn": ["example.com"]}
>     -> {"source.ip": "127.0.0.1", "source.fqdn": ["example.com"]},
> {"source.ip": "127.0.0.2", "source.fqdn": ["example.com"]}
> {"source.ip": ["127.0.0.1", "127.0.0.2"], "source.fqdn": ["example.com",
> "example.org"]}
>     -> {"source.ip": "127.0.0.1", "source.fqdn": "example.com"},
> {"source.ip": "127.0.0.1", "source.fqdn": "example.org"}, {"source.ip":
> "127.0.0.2", "source.fqdn": "example.com"}, {"source.ip": "127.0.0.2",
> "source.fqdn": "example.org"}

   Ah, here goes cartesian. :) In theory this could work. In reality - don't
do that. We tried. Soon somebody starts to use multiple values for scans and
DDoSes, and you really do not want to grind the processing machine to the
halt when creating specific event for 200 source ips times 150 fqdns times
400 target ips times 350 fqdns times 50 asns, times ...
   This goes to too big numbers too fast.

> IntelMQ followed the KISS ("keep it simple, stupid")[4] principle from its
> beginning. It is disputable if multiple values breaks with this principle.
> 
> [4]: https://en.wikipedia.org/wiki/KISS_principle

   Depends on usecase - you might decide against multivalues just because
majority of IntelMQ users and use-cases does not need it and does weigh over
complexity increase. We had to bite the bullet, because we have a number of
our own sources of data, which are inherently M:N, YMMV.

> ## Alternatives
> 
> An alternative to using multiple values per field is to set unique
> identifiers (e.g. UUID) per event and let events with the same origin have
> the same "parent" identifier. This way, related events can be linked and
> compatibility is easier. Relating the events to each other requires extra
> steps although, but keeps the KISS principle. This approach will be
> described in IEP04.

   Complications are even here - how long should reader wait for possible
child events? How does it know it has a complete set, before processing it
and/or sending it forward?

> ## Other IoC processing formats
>
> For reference, we describe the formats of other IoC-processing systems similar
> to IntelMQ. Both formats, IDEA and n6 do support multiple values in different
> kinds. If you know of other similar formats supporting multiple values, please
> speak up!

   As Idea is loosely based on IDMEF, I've been contacted by Prelude SIEM
guys, who are trying to do similar things at: https://www.secef.net/
   Haven't had time to review their work though.

Cheers
-- Pavel Kácha, CESNET
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.cert.at/pipermail/intelmq-users/attachments/20210331/b0f46e94/attachment.sig>


More information about the IntelMQ-users mailing list