Re: [IntelMQ-dev] Decision on IEP04: IntelMQ Data Format - Meta-Information

17 May 2021

      Given that we have until yet not reached a consensus on the exact format
changes and that IntelMQ 3.0 is approaching - in fact we want to do the
(first) release candidate end of May - I propose to postpone the
implementation of IEP03 to after IntelMQ 3.0, maybe 3.1. Implementing a
major format change in a rush only causes troubles.
We have at lease two open discussions:
- How to store linking information using UUIDs:
https://github.com/certtools/ieps/issues/1
- And the specification of the format and type fields:
https://github.com/certtools/ieps/issues/3
Again I call all contributors and users to participate in the
discussion, on the mailing list or on GitHub.
Sebastian
On 4/22/21 3:24 PM, Sebastian Wagner wrote:
...
Dear IntelMQ Developers and Users
In today's hackathon we discussed IEP04 in detail and the proposal was
generally adopted. To cover additional use-cases the UUIDs will be
extended to also cover certain kind of relations between events by
referring to them using UUIDs. The exact details of this format will
be discussed in the next days, on intelmq-dev or on GitHub
(https://github.com/certtools/ieps).
Thanks - again ;) - everybody for the feedback!
kind regards
Sebastian
On 3/30/21 5:56 PM, Sebastian Waldbauer wrote:
...
Dear IntelMQ Developers and Users,
nowadays security incidents are more important than 10 years ago. As
IntelMQ can be used as core element for automated security incident
handling, we would like to provide a way to share information with
other intelmq instances. This proposal is also an alternative to
IEP03 insofar as solving the "multiple values" is possible by using
UUIDs so "link" related events in a backwards-compatible manner.
If you're interested, please let us know, so we could organize a
hackathon for further discussions about the specification of the
meta-information.
Previously this idea was discussed in [0] and [1].
[0]
https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectur...
[1] https://github.com/certtools/intelmq/issues/1521
# IEP04: Internal Data Format: Meta Information and Data Exchange
To ease data exchange between two or more IntelMQ instances, adding
some meta-information to the events can make this sharing easier in
certain regards.
"Linking" events could be based on the same theory as `git` using it

with parent hashes ( we would call it UUID ).

### TL;DR
Communication between one or more IntelMQ instances & exchange data
with a backwards-compatible format. P2P or centralized architecture
is a big topic, which has to be discussed after the format is being set.
### Why is metadata important?
Short and simple. To avoid race conditions & being able to
discard/drop already processed events from other instances.
### Meta information
Metadata is used to transfer some general data, which is not likely
related to the event itself. It's more or less just an information to
keep events clear & sortable.
A message could look like:
{
    "meta": {
        "version": 1, # protocol version, so we are allowed to
fallback to old versions too
        "uuid": {
           current: "cert_at:aaaa-bbbb-cccc-dddd" # format to be decided
           parent: "cert_at:xxxx-yyyy-zzzz-ffff" # format to be
discussed, if not set -> current is the parent uuid
        },
        "type": "event",
        "format": "intelmq", # i. e. this field could contain "n6" or
"idea", so the receiving component can decode on demand.
    },
    "payload": { # normal intelmq data
        "source.ip": "127.0.0.1",
        "source.fqdn": "example.com",
        "raw": base64-blob
    }
}
Tell us your opinion about adding non-standardized meta-information
fields ( i. e. RTIR ticket number, origin, other local contact
informationen ... and so on )
#### The UUID
For the UUID there are multiple options:

Generate a random 128 bit UUID
A list of entities, which dealt with this event already. For

example if an event was passed on from cert-at to cert-ee, the field
could look like `!cert-at!cert-ee`. A message sending loop can be
detected if the own name is already in this field upon reception.
3. Using CyCat: `publisher-short-name:project-short-name:UUID`. For
example: `cert-at:intelmq:72ddb00c-2d0a-4eea-b7ac-ae122b8e6c3b`, or
`cert-pl:n6:f60c9fb9-81f9-4e0b-8a44-ea41326a15b3`. Some more research
and discussion is required before the implementation of this option.
Have a look at https://www.cycat.org/services/concept/ for more details.
4. A hash: A benefit using a hash is that we're able to recalculate
them on every intelmq instance.
### Exporting events to other systems
In IntelMQ 2.x the events only comprise of the "payload" and no meta
information. For local storages like file output or databases, the
meta information may not be relevant in some use-cases. So it needs
to be possible to export events *without* meta information, which is
also the backwards-compatible behaviour.
The "type" field exists in the current format as "__type" in the flat
payload structure. In the output bots there's currently a boolean
parameter `message_with_type` to include the field `__type` in the
"export".
For optionally exporting meta-information like uuid or format, a
similar logic could be used.
### How can data exchange work?
This now depends on how IntelMQ instances can communicate, either
Peer-to-peer or via a central data hub. Both of them do have pro's
and con's.
#### P2P ( Peer 2 Peer )
Decentralized network

Less downtimes: A downtime of one instance, does not affect the

whole network

Better privacy: data is not shared to an unrelated instance
More secure: data can optionally be encrypted (key-exchange between

instances?)

Decentralized and local maintenance

~ Network latency depends on server locations

Networking issues may occur

How would data exchange looks like between two instances:

Instance A has events which should be relayed to Instance B & C,

because they're not sure who the actually receiver should be
2) Instance A ensures all messages have a UUID
3) Instance A sends the data to Instance B & Instance C
4) Instance B checks the data & they're sure that the data should be
for Instance C
5) Instance C receives data from Instance A & Instance B
6) Instance C checks the UUID, which is the same & drops the package
from Instance B
#### (Central) Data hub

Less maintenance: Is maintained by the hub administrator
Central data storage (reports can optionally be cached to be

downloaded later)
~ Central data analysis (e.g. statistics) is possible
~ Network latency depends on server locations

point of failure: if network problems occur, no exchange is possible

As already seen above, data exchange here would be less complicated.
The sending may look like:

Instance A has events which should be relayed to Instance B (e.g.

different country)
2) Instance A ensures all messages have a UUID
3) Instance A sends these messages to the data hub
The reception side can look like:

Instance B connects to central instance
Instance B queries and downloads all available messages
Upon reception, all messages are de-duplicated based on the UUID:

a) If the UUID is already known, discard the message
  b) If the UUID has not been seen before, continue with processing
To sum up, both exchange variants are useful. More research is
needed, i. e. a mixed infrastructure with centralized parts but can
be decentralized too. However, this shall not be neither the purpose
nor the aim of this IEP.
-- 
// Sebastian Wagner wagner@cert.at - T: +43 1 5056416 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg
-- 
// Sebastian Wagner wagner@cert.at - T: +43 676 898 298 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

Re: [IntelMQ-dev] Decision on IEP04: IntelMQ Data Format - Meta-Information