<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Dear community,</p>

    <p>In today's hackathon we discussed IEP03 in detail.</p>

    <p>As described in the original proposal, IEP03 was based on the

      IntelMQ 3.0 architecture document[0]. The discussion we just had

      showed, that there are definitely use-cases which can be enhanced

      by such a data format change and IntelMQ can involve in such a

      direction in the future. It was also pointed out, that the change

      does not necessarily break KISS, as the implementation should be

      just as complex as it needs to be to solve the problem but no more

      complex. However, the known use-cases as of IntelMQ 3.0 are not

      enough to implement this major change at this stage and for

      IntelMQ 3.0, given the big negative impact. Other use-cases which

      support such a feature are not yet known well enough in detail and

      need to be collected and described on a larger scale first, with a

      vision for IntelMQ 4.0 in mind. The IntelMQ Architecture Board,

      which is being started now, will support this process. IEP04 will

      be adapted to incorporate the use-cases covered by IEP03.</p>

    <p>Thanks again to everyone for your valuable input and your

      engagement to bring IntelMQ forward!<br>

    </p>

    <p>best regards<br>

      Sebastian<br>

    </p>

    <p>[0]

<a class="moz-txt-link-freetext" href="https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architecture-3.0.md#user-content-general-requirements">https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architecture-3.0.md#user-content-general-requirements</a></p>

    <div class="moz-cite-prefix">On 3/30/21 5:53 PM, Sebastian Waldbauer

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:1c03bb68-0bd8-b1ee-5dcb-452df28d9ffa@cert.at">Dear

      IntelMQ Developers and Users,

      <br>

      <br>

      an evaluation of current challenges with the internal data format

      led to the idea of allowing multiple values for one field in

      IntelMQ 3.0 (scheduled for June 2021)[0].

      <br>

      The idea is described below, including various advantages and

      disadvantages. We appreciate your input, opinion and analysis of

      further implications on this idea.

      <br>

      We plan to evaluate the feedback that emerged in two weeks.

      <br>

      <br>

      [0]

<a class="moz-txt-link-freetext" href="https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architecture-3.0.md#user-content-general-requirements">https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architecture-3.0.md#user-content-general-requirements</a><br>

           "The new IDF shall support (sorted) lists of IPs, domains,

      taxonomy categories, etc. By convention the most relevant item in

      such a list MUST be the first item in the sorted list."

      <br>

<a class="moz-txt-link-freetext" href="https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architecture-3.0.md#user-content-interoperability-with-certpls">https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architecture-3.0.md#user-content-interoperability-with-certpls</a>-[0]:

      n6-system

      <br>

           "Since the new IDF shall support multiple values, mapping to

      n6 should be rather easy."

      <br>

      <br>

      ## Use-cases

      <br>

      ### Network information

      <br>

      IntelMQ's format currently allows for *exactly one* value per

      field. For example, every event can have *one* `source.ip` and

      *one* `source.fqdn`. In some use-cases, multiple values can be

      useful, for example when querying DNS information. One domain

      (`source.fqdn`) can point to multiple IP addresses (`source.ip`).

      The other way round, multiple domains point to the same IP address

      is also very common. The use-case first appeared was that one IP

      address can be part of multiple Autonomous systems

      (`source.asn`).[1][2][3]

      <br>

      <br>

      See the examples below in section Format.

      <br>

      <br>

      [1]: "Multiple ASNs/networks per IP? #543"

      <a class="moz-txt-link-freetext" href="https://github.com/certtools/intelmq/issues/543">https://github.com/certtools/intelmq/issues/543</a>

      <br>

      [2]: "BOT: DNS lookup #373"

      <a class="moz-txt-link-freetext" href="https://github.com/certtools/intelmq/issues/373">https://github.com/certtools/intelmq/issues/373</a>

      <br>

      [3]: "reverse DNS: Only first record is used

      "<a class="moz-txt-link-freetext" href="https://github.com/certtools/intelmq/issues/877">https://github.com/certtools/intelmq/issues/877</a>

      <br>

      <br>

      ### Classification

      <br>

      Another use-case is to use multiple classifications.[5] For

      example, if a website was hacked and used for a phishing page, it

      can be assigned two classifications:

      <br>

      For the hacking: Taxonomy: information-content-security, type:

      unauthorised-modification-of-information

      <br>

      For the phishing page: Taxonomy: fraud, type: phishing

      <br>

      <br>

      Another example are reachable networks services, which should not

      be accessible by the internet. Shadowserver provides a lot of this

      data.

      <br>

      Open XDMCP instances are both DDoS amplifiers and Potentially

      unwanted accessible systems. Therefore both classifications apply:

      <br>

      Taxonomy: vulnerable, type: ddos-amplifier

      <br>

      Taxonomy: vulnerable, type: potentially-unwanted-accessible-system

      <br>

      <br>

      A list of all fields on the RSIT can be found in the RSIT

      repository[6]

      <br>

      <br>

      [5]:

<a class="moz-txt-link-freetext" href="https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/blob/master/Documentation/Usage.md#user-content-multiple-classifications">https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/blob/master/Documentation/Usage.md#user-content-multiple-classifications</a><br>

      [6]:

<a class="moz-txt-link-freetext" href="https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/blob/master/working_copy/humanv1.md">https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/blob/master/working_copy/humanv1.md</a><br>

      ## Format

      <br>

      Some examples:

      <br>

      {"source.ip": ["192.0.43.8"], "source.asn": [16876, 40528]}

      <br>

      {"source.ip": ["10.0.0.1", "10.0.0.2"], "source.url":

      [<a class="moz-txt-link-rfc2396E" href="http://example.com/">"http://example.com/"</a>, <a class="moz-txt-link-rfc2396E" href="http://example.net">"http://example.net"</a>]}

      <br>

      {"classification.taxonomy": ["information-content-security",

      "fraud"], "classification.type":

      ["unauthorised-modification-of-information", "phishing"],

      "source.url": [<a class="moz-txt-link-rfc2396E" href="http://example.com/">"http://example.com/"</a>], "source.ip": ["10.0.0.1",

      "10.0.0.2"]}

      <br>

      <br>

      In the bots' code multiple values need to be taken car of. For

      example, instead of:

      <br>

      <br>

          ip_addr = event["source.ip"]

      <br>

          # do stuff

      <br>

      <br>

      it is necessary to loop over the values:

      <br>

      <br>

          for ip_addr in event["source.ip"]:

      <br>

              # do stuff

      <br>

      <br>

      This logic is required for *all* fields which can have multiple

      values, therefore nested loops may be necessary.

      <br>

      <br>

      Everything which processes IntelMQ data needs to be adapted,

      including data bases. See the "Disadvantages" section below.

      <br>

      <br>

      ### Optional back-conversion ("value-explosion")

      <br>

      <br>

      One variant/option of this IEP is to create a conversion layer

      from the new multi-value format to the old one-value format by

      creating multiple events with only one value per field. Using this

      conversion, compatibility with external components can be kept,

      while the advantages only exist inside the IntelMQ core (ie. the

      bots).

      <br>

      <br>

      Examples:

      <br>

      {"source.ip": ["127.0.0.1", "127.0.0.2"], "source.fqdn":

      ["example.com"]}

      <br>

          -> {"source.ip": "127.0.0.1", "source.fqdn":

      ["example.com"]}, {"source.ip": "127.0.0.2", "source.fqdn":

      ["example.com"]}

      <br>

      {"source.ip": ["127.0.0.1", "127.0.0.2"], "source.fqdn":

      ["example.com", "example.org"]}

      <br>

          -> {"source.ip": "127.0.0.1", "source.fqdn":

      "example.com"}, {"source.ip": "127.0.0.1", "source.fqdn":

      "example.org"}, {"source.ip": "127.0.0.2", "source.fqdn":

      "example.com"}, {"source.ip": "127.0.0.2", "source.fqdn":

      "example.org"}

      <br>

      <br>

      ### What will change?

      <br>

      We'll change the behaviour of the current IntelMQ internal parsing

      process, i. e. you'll be able to add multiple IP addresses to on

      field, which will be handled as multiple events, but merged into

      one event.

      <br>

      This will allow us to combine i. e. a domain with multiple IP

      addresses to one event.

      <br>

      <br>

      ### Advantages

      <br>

      <br>

      Supporting multiple values allows us to add multiple IP addresses

      to one event. As opposed to using multiple events with nearly

      similar data, the multiple-value approach reduces data duplication

      and has less overhead, while on the other hand the complexity

      increases.

      <br>

      If multiple events would be used instead, related events would

      need to be linked together by other means (see section Alternative

      below).

      <br>

      <br>

      ### Disadvantages (breaking behaviour)

      <br>

      <br>

      The complexity in IntelMQ and all linked components increases

      without doubt. All components dealing with the IntelMQ-data need

      to be adapted to deal with multiple values. This includes all

      bots, but IntelMQ administrators need to adapt their

      configurations (e.g. filters, etc.) as well.

      <br>

      <br>

      Without the explosion-variant, all connected databases need to be

      adapted (e.g. PostgreSQL, SQLite, Elastic, MongoDB etc.)

      additionally and all software which is processing data from

      IntelMQ need to be adapted. PostgreSQL support arrays for columns,

      but the scheme conversion can be complex and resource-hungry.

      <br>

      <br>

      IntelMQ followed the KISS ("keep it simple, stupid")[4] principle

      from its beginning. It is disputable if multiple values breaks

      with this principle.

      <br>

      <br>

      [4]: <a class="moz-txt-link-freetext" href="https://en.wikipedia.org/wiki/KISS_principle">https://en.wikipedia.org/wiki/KISS_principle</a>

      <br>

      ## Alternatives

      <br>

      <br>

      An alternative to using multiple values per field is to set unique

      identifiers (e.g. UUID) per event and let events with the same

      origin have the same "parent" identifier. This way, related events

      can be linked and compatibility is easier. Relating the events to

      each other requires extra steps although, but keeps the KISS

      principle. This approach will be described in IEP04.

      <br>

      <br>

      To solve the use-case of multiple classifications per event, the

      primary and most important classification can be used instead of

      multiple ones.

      <br>

      <br>

      A possible solution for the classification use-case above would be

      to some sort of tagging - in short "tags". I. e.

      <br>

      {

      <br>

         "source.ip": ["192.0.43.8"],

      <br>

         "source.asn": [16876, 40528],

      <br>

         "tags": ["ddos-amplifier", "info-disclosure", "mirai-botnet"]

      <br>

      }

      <br>

      <br>

      ## Other IoC processing formats

      <br>

      <br>

      For reference, we describe the formats of other IoC-processing

      systems similar to IntelMQ. Both formats, IDEA and n6 do support

      multiple values in different kinds. If you know of other similar

      formats supporting multiple values, please speak up!

      <br>

      <br>

      ### "IDEA"

      <br>

      <br>

      The IDEA-format, used by CESNET-developed Warden, supports

      multiple values for some fields. But the data format structure

      differs clearly from IntelMQ's, as you can see in the example

      below. The classification is defined per address and network

      ranges are possible as addresses, what is not supported in

      IntelMQ.

      <br>

      IDEA was designed from scratch to overcome disadvantages of

      Warden's previous data format.

      <br>

      <br>

      Example:

      <br>

         "Source": [

      <br>

            {

      <br>

               "Type": ["Phishing"],

      <br>

               "IP4": ["192.168.0.2-192.168.0.5", "192.168.0.10/25"],

      <br>

               "IP6": ["2001:0db8:0000:0000:0000:ff00:0042::/112"],

      <br>

               "Hostname": ["example.com"],

      <br>

               "URL": [<a class="moz-txt-link-rfc2396E" href="http://example.com/cgi-bin/killemall">"http://example.com/cgi-bin/killemall"</a>],

      <br>

               "Proto": ["tcp", "http"],

      <br>

               "AttachHand": ["att1"],

      <br>

               "Netname": ["ripe:IANA-CBLK-RESERVED1"]

      <br>

            }

      <br>

         ],

      <br>

         "Target": [

      <br>

            {

      <br>

               "Type": ["Backscatter", "OriginSpam"],

      <br>

               "Email": [<a class="moz-txt-link-rfc2396E" href="mailto:innocent@example.com">"innocent@example.com"</a>],

      <br>

               "Spoofed": true

      <br>

            },

      <br>

            {

      <br>

               "IP4": ["10.2.2.0/24"],

      <br>

               "Anonymised": true

      <br>

            }

      <br>

         ]

      <br>

      <br>

      Upstream documentation:

      <br>

      <a class="moz-txt-link-freetext" href="https://idea.cesnet.cz/en/index">https://idea.cesnet.cz/en/index</a>

      <br>

      <a class="moz-txt-link-freetext" href="https://warden.cesnet.cz/en/index">https://warden.cesnet.cz/en/index</a>

      <br>

      <br>

      ### n6

      <br>

      <br>

      In the n6 format, the addr field is a list of arrays with `ip`,

      `asn`, `cc` and `dir` fields. `addr` is similar to IntelMQ's

      `source` namespace, but the size of `addr` is much lower and the

      "direction" of the address is given by a field inside the addr

      item.

      <br>

      <br>

      Example:

      <br>

      [{"ipv6": "abcd::1", "cc": "PL", "asn": 12345, "dir": "dst"}]

      <br>

      <br>

      Upstream documentation:

      <br>

<a class="moz-txt-link-freetext" href="https://n6sdk.readthedocs.io/en/latest/tutorial.html#field-class-addressfield">https://n6sdk.readthedocs.io/en/latest/tutorial.html#field-class-addressfield</a>

      <br>

<a class="moz-txt-link-freetext" href="https://n6sdk.readthedocs.io/en/latest/tutorial.html#field-class-extendedaddressfield">https://n6sdk.readthedocs.io/en/latest/tutorial.html#field-class-extendedaddressfield</a>

      <br>

      <br>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

    </blockquote>

    <pre class="moz-signature" cols="72">-- 

// Sebastian Wagner <a class="moz-txt-link-rfc2396E" href="mailto:wagner@cert.at"><wagner@cert.at></a> - T: +43 1 5056416 7201

// CERT Austria - <a class="moz-txt-link-freetext" href="https://www.cert.at/">https://www.cert.at/</a>

// Eine Initiative der nic.at GmbH - <a class="moz-txt-link-freetext" href="https://www.nic.at/">https://www.nic.at/</a>

// Firmenbuchnummer 172568b, LG Salzburg</pre>

  </body>

</html>