<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Dear community,</p>
<p>In today's hackathon we discussed IEP03 in detail.</p>
<p>As described in the original proposal, IEP03 was based on the
IntelMQ 3.0 architecture document[0]. The discussion we just had
showed, that there are definitely use-cases which can be enhanced
by such a data format change and IntelMQ can involve in such a
direction in the future. It was also pointed out, that the change
does not necessarily break KISS, as the implementation should be
just as complex as it needs to be to solve the problem but no more
complex. However, the known use-cases as of IntelMQ 3.0 are not
enough to implement this major change at this stage and for
IntelMQ 3.0, given the big negative impact. Other use-cases which
support such a feature are not yet known well enough in detail and
need to be collected and described on a larger scale first, with a
vision for IntelMQ 4.0 in mind. The IntelMQ Architecture Board,
which is being started now, will support this process. IEP04 will
be adapted to incorporate the use-cases covered by IEP03.</p>
<p>Thanks again to everyone for your valuable input and your
engagement to bring IntelMQ forward!<br>
</p>
<p>best regards<br>
Sebastian<br>
</p>
<p>[0]
<a class="moz-txt-link-freetext" href="https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architecture-3.0.md#user-content-general-requirements">https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architecture-3.0.md#user-content-general-requirements</a></p>
<div class="moz-cite-prefix">On 3/30/21 5:53 PM, Sebastian Waldbauer
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:1c03bb68-0bd8-b1ee-5dcb-452df28d9ffa@cert.at">Dear
IntelMQ Developers and Users,
<br>
<br>
an evaluation of current challenges with the internal data format
led to the idea of allowing multiple values for one field in
IntelMQ 3.0 (scheduled for June 2021)[0].
<br>
The idea is described below, including various advantages and
disadvantages. We appreciate your input, opinion and analysis of
further implications on this idea.
<br>
We plan to evaluate the feedback that emerged in two weeks.
<br>
<br>
[0]
<a class="moz-txt-link-freetext" href="https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architecture-3.0.md#user-content-general-requirements">https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architecture-3.0.md#user-content-general-requirements</a><br>
"The new IDF shall support (sorted) lists of IPs, domains,
taxonomy categories, etc. By convention the most relevant item in
such a list MUST be the first item in the sorted list."
<br>
<a class="moz-txt-link-freetext" href="https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architecture-3.0.md#user-content-interoperability-with-certpls">https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architecture-3.0.md#user-content-interoperability-with-certpls</a>-[0]:
n6-system
<br>
"Since the new IDF shall support multiple values, mapping to
n6 should be rather easy."
<br>
<br>
## Use-cases
<br>
### Network information
<br>
IntelMQ's format currently allows for *exactly one* value per
field. For example, every event can have *one* `source.ip` and
*one* `source.fqdn`. In some use-cases, multiple values can be
useful, for example when querying DNS information. One domain
(`source.fqdn`) can point to multiple IP addresses (`source.ip`).
The other way round, multiple domains point to the same IP address
is also very common. The use-case first appeared was that one IP
address can be part of multiple Autonomous systems
(`source.asn`).[1][2][3]
<br>
<br>
See the examples below in section Format.
<br>
<br>
[1]: "Multiple ASNs/networks per IP? #543"
<a class="moz-txt-link-freetext" href="https://github.com/certtools/intelmq/issues/543">https://github.com/certtools/intelmq/issues/543</a>
<br>
[2]: "BOT: DNS lookup #373"
<a class="moz-txt-link-freetext" href="https://github.com/certtools/intelmq/issues/373">https://github.com/certtools/intelmq/issues/373</a>
<br>
[3]: "reverse DNS: Only first record is used
"<a class="moz-txt-link-freetext" href="https://github.com/certtools/intelmq/issues/877">https://github.com/certtools/intelmq/issues/877</a>
<br>
<br>
### Classification
<br>
Another use-case is to use multiple classifications.[5] For
example, if a website was hacked and used for a phishing page, it
can be assigned two classifications:
<br>
For the hacking: Taxonomy: information-content-security, type:
unauthorised-modification-of-information
<br>
For the phishing page: Taxonomy: fraud, type: phishing
<br>
<br>
Another example are reachable networks services, which should not
be accessible by the internet. Shadowserver provides a lot of this
data.
<br>
Open XDMCP instances are both DDoS amplifiers and Potentially
unwanted accessible systems. Therefore both classifications apply:
<br>
Taxonomy: vulnerable, type: ddos-amplifier
<br>
Taxonomy: vulnerable, type: potentially-unwanted-accessible-system
<br>
<br>
A list of all fields on the RSIT can be found in the RSIT
repository[6]
<br>
<br>
[5]:
<a class="moz-txt-link-freetext" href="https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/blob/master/Documentation/Usage.md#user-content-multiple-classifications">https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/blob/master/Documentation/Usage.md#user-content-multiple-classifications</a><br>
[6]:
<a class="moz-txt-link-freetext" href="https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/blob/master/working_copy/humanv1.md">https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/blob/master/working_copy/humanv1.md</a><br>
## Format
<br>
Some examples:
<br>
{"source.ip": ["192.0.43.8"], "source.asn": [16876, 40528]}
<br>
{"source.ip": ["10.0.0.1", "10.0.0.2"], "source.url":
[<a class="moz-txt-link-rfc2396E" href="http://example.com/">"http://example.com/"</a>, <a class="moz-txt-link-rfc2396E" href="http://example.net">"http://example.net"</a>]}
<br>
{"classification.taxonomy": ["information-content-security",
"fraud"], "classification.type":
["unauthorised-modification-of-information", "phishing"],
"source.url": [<a class="moz-txt-link-rfc2396E" href="http://example.com/">"http://example.com/"</a>], "source.ip": ["10.0.0.1",
"10.0.0.2"]}
<br>
<br>
In the bots' code multiple values need to be taken car of. For
example, instead of:
<br>
<br>
ip_addr = event["source.ip"]
<br>
# do stuff
<br>
<br>
it is necessary to loop over the values:
<br>
<br>
for ip_addr in event["source.ip"]:
<br>
# do stuff
<br>
<br>
This logic is required for *all* fields which can have multiple
values, therefore nested loops may be necessary.
<br>
<br>
Everything which processes IntelMQ data needs to be adapted,
including data bases. See the "Disadvantages" section below.
<br>
<br>
### Optional back-conversion ("value-explosion")
<br>
<br>
One variant/option of this IEP is to create a conversion layer
from the new multi-value format to the old one-value format by
creating multiple events with only one value per field. Using this
conversion, compatibility with external components can be kept,
while the advantages only exist inside the IntelMQ core (ie. the
bots).
<br>
<br>
Examples:
<br>
{"source.ip": ["127.0.0.1", "127.0.0.2"], "source.fqdn":
["example.com"]}
<br>
-> {"source.ip": "127.0.0.1", "source.fqdn":
["example.com"]}, {"source.ip": "127.0.0.2", "source.fqdn":
["example.com"]}
<br>
{"source.ip": ["127.0.0.1", "127.0.0.2"], "source.fqdn":
["example.com", "example.org"]}
<br>
-> {"source.ip": "127.0.0.1", "source.fqdn":
"example.com"}, {"source.ip": "127.0.0.1", "source.fqdn":
"example.org"}, {"source.ip": "127.0.0.2", "source.fqdn":
"example.com"}, {"source.ip": "127.0.0.2", "source.fqdn":
"example.org"}
<br>
<br>
### What will change?
<br>
We'll change the behaviour of the current IntelMQ internal parsing
process, i. e. you'll be able to add multiple IP addresses to on
field, which will be handled as multiple events, but merged into
one event.
<br>
This will allow us to combine i. e. a domain with multiple IP
addresses to one event.
<br>
<br>
### Advantages
<br>
<br>
Supporting multiple values allows us to add multiple IP addresses
to one event. As opposed to using multiple events with nearly
similar data, the multiple-value approach reduces data duplication
and has less overhead, while on the other hand the complexity
increases.
<br>
If multiple events would be used instead, related events would
need to be linked together by other means (see section Alternative
below).
<br>
<br>
### Disadvantages (breaking behaviour)
<br>
<br>
The complexity in IntelMQ and all linked components increases
without doubt. All components dealing with the IntelMQ-data need
to be adapted to deal with multiple values. This includes all
bots, but IntelMQ administrators need to adapt their
configurations (e.g. filters, etc.) as well.
<br>
<br>
Without the explosion-variant, all connected databases need to be
adapted (e.g. PostgreSQL, SQLite, Elastic, MongoDB etc.)
additionally and all software which is processing data from
IntelMQ need to be adapted. PostgreSQL support arrays for columns,
but the scheme conversion can be complex and resource-hungry.
<br>
<br>
IntelMQ followed the KISS ("keep it simple, stupid")[4] principle
from its beginning. It is disputable if multiple values breaks
with this principle.
<br>
<br>
[4]: <a class="moz-txt-link-freetext" href="https://en.wikipedia.org/wiki/KISS_principle">https://en.wikipedia.org/wiki/KISS_principle</a>
<br>
## Alternatives
<br>
<br>
An alternative to using multiple values per field is to set unique
identifiers (e.g. UUID) per event and let events with the same
origin have the same "parent" identifier. This way, related events
can be linked and compatibility is easier. Relating the events to
each other requires extra steps although, but keeps the KISS
principle. This approach will be described in IEP04.
<br>
<br>
To solve the use-case of multiple classifications per event, the
primary and most important classification can be used instead of
multiple ones.
<br>
<br>
A possible solution for the classification use-case above would be
to some sort of tagging - in short "tags". I. e.
<br>
{
<br>
"source.ip": ["192.0.43.8"],
<br>
"source.asn": [16876, 40528],
<br>
"tags": ["ddos-amplifier", "info-disclosure", "mirai-botnet"]
<br>
}
<br>
<br>
## Other IoC processing formats
<br>
<br>
For reference, we describe the formats of other IoC-processing
systems similar to IntelMQ. Both formats, IDEA and n6 do support
multiple values in different kinds. If you know of other similar
formats supporting multiple values, please speak up!
<br>
<br>
### "IDEA"
<br>
<br>
The IDEA-format, used by CESNET-developed Warden, supports
multiple values for some fields. But the data format structure
differs clearly from IntelMQ's, as you can see in the example
below. The classification is defined per address and network
ranges are possible as addresses, what is not supported in
IntelMQ.
<br>
IDEA was designed from scratch to overcome disadvantages of
Warden's previous data format.
<br>
<br>
Example:
<br>
"Source": [
<br>
{
<br>
"Type": ["Phishing"],
<br>
"IP4": ["192.168.0.2-192.168.0.5", "192.168.0.10/25"],
<br>
"IP6": ["2001:0db8:0000:0000:0000:ff00:0042::/112"],
<br>
"Hostname": ["example.com"],
<br>
"URL": [<a class="moz-txt-link-rfc2396E" href="http://example.com/cgi-bin/killemall">"http://example.com/cgi-bin/killemall"</a>],
<br>
"Proto": ["tcp", "http"],
<br>
"AttachHand": ["att1"],
<br>
"Netname": ["ripe:IANA-CBLK-RESERVED1"]
<br>
}
<br>
],
<br>
"Target": [
<br>
{
<br>
"Type": ["Backscatter", "OriginSpam"],
<br>
"Email": [<a class="moz-txt-link-rfc2396E" href="mailto:innocent@example.com">"innocent@example.com"</a>],
<br>
"Spoofed": true
<br>
},
<br>
{
<br>
"IP4": ["10.2.2.0/24"],
<br>
"Anonymised": true
<br>
}
<br>
]
<br>
<br>
Upstream documentation:
<br>
<a class="moz-txt-link-freetext" href="https://idea.cesnet.cz/en/index">https://idea.cesnet.cz/en/index</a>
<br>
<a class="moz-txt-link-freetext" href="https://warden.cesnet.cz/en/index">https://warden.cesnet.cz/en/index</a>
<br>
<br>
### n6
<br>
<br>
In the n6 format, the addr field is a list of arrays with `ip`,
`asn`, `cc` and `dir` fields. `addr` is similar to IntelMQ's
`source` namespace, but the size of `addr` is much lower and the
"direction" of the address is given by a field inside the addr
item.
<br>
<br>
Example:
<br>
[{"ipv6": "abcd::1", "cc": "PL", "asn": 12345, "dir": "dst"}]
<br>
<br>
Upstream documentation:
<br>
<a class="moz-txt-link-freetext" href="https://n6sdk.readthedocs.io/en/latest/tutorial.html#field-class-addressfield">https://n6sdk.readthedocs.io/en/latest/tutorial.html#field-class-addressfield</a>
<br>
<a class="moz-txt-link-freetext" href="https://n6sdk.readthedocs.io/en/latest/tutorial.html#field-class-extendedaddressfield">https://n6sdk.readthedocs.io/en/latest/tutorial.html#field-class-extendedaddressfield</a>
<br>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
</blockquote>
<pre class="moz-signature" cols="72">--
// Sebastian Wagner <a class="moz-txt-link-rfc2396E" href="mailto:wagner@cert.at"><wagner@cert.at></a> - T: +43 1 5056416 7201
// CERT Austria - <a class="moz-txt-link-freetext" href="https://www.cert.at/">https://www.cert.at/</a>
// Eine Initiative der nic.at GmbH - <a class="moz-txt-link-freetext" href="https://www.nic.at/">https://www.nic.at/</a>
// Firmenbuchnummer 172568b, LG Salzburg</pre>
</body>
</html>