Dear IntelMQ Developers and Users,
nowadays security incidents are more important than 10 years ago. As IntelMQ can be used as core element for automated security incident handling, we would like to provide a way to share information with other intelmq instances. This proposal is also an alternative to IEP03 insofar as solving the "multiple values" is possible by using UUIDs so "link" related events in a backwards-compatible manner.
If you're interested, please let us know, so we could organize a hackathon for further discussions about the specification of the meta-information. Previously this idea was discussed in [0] and [1].
[0] https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectur... [1] https://github.com/certtools/intelmq/issues/1521 # IEP04: Internal Data Format: Meta Information and Data Exchange To ease data exchange between two or more IntelMQ instances, adding some meta-information to the events can make this sharing easier in certain regards. "Linking" events could be based on the same theory as `git` using it - with parent hashes ( we would call it UUID ).
### TL;DR Communication between one or more IntelMQ instances & exchange data with a backwards-compatible format. P2P or centralized architecture is a big topic, which has to be discussed after the format is being set.
### Why is metadata important? Short and simple. To avoid race conditions & being able to discard/drop already processed events from other instances.
### Meta information Metadata is used to transfer some general data, which is not likely related to the event itself. It's more or less just an information to keep events clear & sortable.
A message could look like:
{ "meta": { "version": 1, # protocol version, so we are allowed to fallback to old versions too "uuid": { current: "cert_at:aaaa-bbbb-cccc-dddd" # format to be decided parent: "cert_at:xxxx-yyyy-zzzz-ffff" # format to be discussed, if not set -> current is the parent uuid }, "type": "event", "format": "intelmq", # i. e. this field could contain "n6" or "idea", so the receiving component can decode on demand. }, "payload": { # normal intelmq data "source.ip": "127.0.0.1", "source.fqdn": "example.com", "raw": base64-blob } }
Tell us your opinion about adding non-standardized meta-information fields ( i. e. RTIR ticket number, origin, other local contact informationen ... and so on )
#### The UUID For the UUID there are multiple options: 1. Generate a random 128 bit UUID 2. A list of entities, which dealt with this event already. For example if an event was passed on from cert-at to cert-ee, the field could look like `!cert-at!cert-ee`. A message sending loop can be detected if the own name is already in this field upon reception. 3. Using CyCat: `publisher-short-name:project-short-name:UUID`. For example: `cert-at:intelmq:72ddb00c-2d0a-4eea-b7ac-ae122b8e6c3b`, or `cert-pl:n6:f60c9fb9-81f9-4e0b-8a44-ea41326a15b3`. Some more research and discussion is required before the implementation of this option. Have a look at https://www.cycat.org/services/concept/ for more details. 4. A hash: A benefit using a hash is that we're able to recalculate them on every intelmq instance.
### Exporting events to other systems In IntelMQ 2.x the events only comprise of the "payload" and no meta information. For local storages like file output or databases, the meta information may not be relevant in some use-cases. So it needs to be possible to export events *without* meta information, which is also the backwards-compatible behaviour.
The "type" field exists in the current format as "__type" in the flat payload structure. In the output bots there's currently a boolean parameter `message_with_type` to include the field `__type` in the "export". For optionally exporting meta-information like uuid or format, a similar logic could be used.
### How can data exchange work? This now depends on how IntelMQ instances can communicate, either Peer-to-peer or via a central data hub. Both of them do have pro's and con's.
#### P2P ( Peer 2 Peer ) Decentralized network + Less downtimes: A downtime of one instance, does not affect the whole network + Better privacy: data is not shared to an unrelated instance + More secure: data can optionally be encrypted (key-exchange between instances?) + Decentralized and local maintenance ~ Network latency depends on server locations - Networking issues may occur
How would data exchange looks like between two instances: 1) Instance A has events which should be relayed to Instance B & C, because they're not sure who the actually receiver should be 2) Instance A ensures all messages have a UUID 3) Instance A sends the data to Instance B & Instance C 4) Instance B checks the data & they're sure that the data should be for Instance C 5) Instance C receives data from Instance A & Instance B 6) Instance C checks the UUID, which is the same & drops the package from Instance B
#### (Central) Data hub + Less maintenance: Is maintained by the hub administrator + Central data storage (reports can optionally be cached to be downloaded later) ~ Central data analysis (e.g. statistics) is possible ~ Network latency depends on server locations - point of failure: if network problems occur, no exchange is possible
As already seen above, data exchange here would be less complicated. The sending may look like: 1) Instance A has events which should be relayed to Instance B (e.g. different country) 2) Instance A ensures all messages have a UUID 3) Instance A sends these messages to the data hub
The reception side can look like: 1) Instance B connects to central instance 2) Instance B queries and downloads all available messages 3) Upon reception, all messages are de-duplicated based on the UUID: a) If the UUID is already known, discard the message b) If the UUID has not been seen before, continue with processing
To sum up, both exchange variants are useful. More research is needed, i. e. a mixed infrastructure with centralized parts but can be decentralized too. However, this shall not be neither the purpose nor the aim of this IEP.
Before going too far down this road, I'd be looking at the suitability or adaptability of STIX / TAXII 2.1 (https://oasis-open.github.io/cti-documentation/stix/intro).
The STIX steering committee has spent years iterating and debating the data model for STIX. They've already done a lot of the hard work on how entities should reference one another, how TLP is implemented, consistent taxonomies, appropriate metadata and so on. There's also Python libs available, so it's more a case of working out how to integrate rather than reinvent.
TAXII provides a HTTP-based transport layer for STIX (or other data formats) which you can operate via push, pull, or otherwise relay via some sort of chained series of TAXII servers.
As a bonus, it would give sharing inter-operation between IntelMQ and other platforms which also implement STIX / TAXII. MISP is one of those (https://www.misp-project.org/2020/06/24/MISP.2.4.128.released.html) so there'd be some good experiences to draw from their developers I feel.
Best regards,
Chris
On 31/03/2021 2:56 am, Sebastian Waldbauer wrote:
Dear IntelMQ Developers and Users,
nowadays security incidents are more important than 10 years ago. As IntelMQ can be used as core element for automated security incident handling, we would like to provide a way to share information with other intelmq instances. This proposal is also an alternative to IEP03 insofar as solving the "multiple values" is possible by using UUIDs so "link" related events in a backwards-compatible manner.
If you're interested, please let us know, so we could organize a hackathon for further discussions about the specification of the meta-information. Previously this idea was discussed in [0] and [1].
[0] https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectur... [1] https://github.com/certtools/intelmq/issues/1521 # IEP04: Internal Data Format: Meta Information and Data Exchange To ease data exchange between two or more IntelMQ instances, adding some meta-information to the events can make this sharing easier in certain regards. "Linking" events could be based on the same theory as `git` using it - with parent hashes ( we would call it UUID ).
### TL;DR Communication between one or more IntelMQ instances & exchange data with a backwards-compatible format. P2P or centralized architecture is a big topic, which has to be discussed after the format is being set.
### Why is metadata important? Short and simple. To avoid race conditions & being able to discard/drop already processed events from other instances.
### Meta information Metadata is used to transfer some general data, which is not likely related to the event itself. It's more or less just an information to keep events clear & sortable.
A message could look like:
{ "meta": { "version": 1, # protocol version, so we are allowed to fallback to old versions too "uuid": { current: "cert_at:aaaa-bbbb-cccc-dddd" # format to be decided parent: "cert_at:xxxx-yyyy-zzzz-ffff" # format to be discussed, if not set -> current is the parent uuid }, "type": "event", "format": "intelmq", # i. e. this field could contain "n6" or "idea", so the receiving component can decode on demand. }, "payload": { # normal intelmq data "source.ip": "127.0.0.1", "source.fqdn": "example.com", "raw": base64-blob } }
Tell us your opinion about adding non-standardized meta-information fields ( i. e. RTIR ticket number, origin, other local contact informationen ... and so on )
#### The UUID For the UUID there are multiple options:
- Generate a random 128 bit UUID
- A list of entities, which dealt with this event already. For
example if an event was passed on from cert-at to cert-ee, the field could look like `!cert-at!cert-ee`. A message sending loop can be detected if the own name is already in this field upon reception. 3. Using CyCat: `publisher-short-name:project-short-name:UUID`. For example: `cert-at:intelmq:72ddb00c-2d0a-4eea-b7ac-ae122b8e6c3b`, or `cert-pl:n6:f60c9fb9-81f9-4e0b-8a44-ea41326a15b3`. Some more research and discussion is required before the implementation of this option. Have a look at https://www.cycat.org/services/concept/ for more details. 4. A hash: A benefit using a hash is that we're able to recalculate them on every intelmq instance.
### Exporting events to other systems In IntelMQ 2.x the events only comprise of the "payload" and no meta information. For local storages like file output or databases, the meta information may not be relevant in some use-cases. So it needs to be possible to export events *without* meta information, which is also the backwards-compatible behaviour.
The "type" field exists in the current format as "__type" in the flat payload structure. In the output bots there's currently a boolean parameter `message_with_type` to include the field `__type` in the "export". For optionally exporting meta-information like uuid or format, a similar logic could be used.
### How can data exchange work? This now depends on how IntelMQ instances can communicate, either Peer-to-peer or via a central data hub. Both of them do have pro's and con's.
#### P2P ( Peer 2 Peer ) Decentralized network
- Less downtimes: A downtime of one instance, does not affect the
whole network
- Better privacy: data is not shared to an unrelated instance
- More secure: data can optionally be encrypted (key-exchange between
instances?)
- Decentralized and local maintenance
~ Network latency depends on server locations
- Networking issues may occur
How would data exchange looks like between two instances:
- Instance A has events which should be relayed to Instance B & C,
because they're not sure who the actually receiver should be 2) Instance A ensures all messages have a UUID 3) Instance A sends the data to Instance B & Instance C 4) Instance B checks the data & they're sure that the data should be for Instance C 5) Instance C receives data from Instance A & Instance B 6) Instance C checks the UUID, which is the same & drops the package from Instance B
#### (Central) Data hub
- Less maintenance: Is maintained by the hub administrator
- Central data storage (reports can optionally be cached to be
downloaded later) ~ Central data analysis (e.g. statistics) is possible ~ Network latency depends on server locations
- point of failure: if network problems occur, no exchange is possible
As already seen above, data exchange here would be less complicated. The sending may look like:
- Instance A has events which should be relayed to Instance B (e.g.
different country) 2) Instance A ensures all messages have a UUID 3) Instance A sends these messages to the data hub
The reception side can look like:
- Instance B connects to central instance
- Instance B queries and downloads all available messages
- Upon reception, all messages are de-duplicated based on the UUID:
a) If the UUID is already known, discard the message b) If the UUID has not been seen before, continue with processing
To sum up, both exchange variants are useful. More research is needed, i. e. a mixed infrastructure with centralized parts but can be decentralized too. However, this shall not be neither the purpose nor the aim of this IEP.
Dear Chris and list,
I agree to Chris that STIX/TAXII is one of the de facto standard in the exchange of the security information. (or implicit de jure ??? :-p
At the same time I am apt to feel hesitation over the variable format such as JSON and XML. This is because I have to provide full text search for such format, but I often realise fts won't work as expected with the bigger dataset. (I am using PostgreSQL and PGroonga, but its index crushes very often. maybe I should give a try on tsvector/tsquery and pg_bigm.)
On the other hand, I also understand why it is required in the noSQL age, so I don't have a clear opinion yet.
Hence, I'd raise a very humble objection to introduce multi-value column and variable format.
Thank you very much.
Best Regards,
Hi Chris,
Thanks for the input.
I looked at STIX/TAXII 2 some while ago and also have been in contact with the creators back then. IIRC and according to what the creators said, STIX/TAXII cannot be used for some/most of the data we are processing in IntelMQ. For example, how do you represent an open port (vulnerable service), an infected device or a malicious website? I don't see any STIX Object listed on the page linked by you, that could match for that kind of data.
kind regards Sebastian
On 3/31/21 5:43 AM, Chris Horsley wrote:
Before going too far down this road, I'd be looking at the suitability or adaptability of STIX / TAXII 2.1 (https://oasis-open.github.io/cti-documentation/stix/intro).
The STIX steering committee has spent years iterating and debating the data model for STIX. They've already done a lot of the hard work on how entities should reference one another, how TLP is implemented, consistent taxonomies, appropriate metadata and so on. There's also Python libs available, so it's more a case of working out how to integrate rather than reinvent.
TAXII provides a HTTP-based transport layer for STIX (or other data formats) which you can operate via push, pull, or otherwise relay via some sort of chained series of TAXII servers.
As a bonus, it would give sharing inter-operation between IntelMQ and other platforms which also implement STIX / TAXII. MISP is one of those (https://www.misp-project.org/2020/06/24/MISP.2.4.128.released.html) so there'd be some good experiences to draw from their developers I feel.
Best regards,
Chris
On 31/03/2021 2:56 am, Sebastian Waldbauer wrote:
Dear IntelMQ Developers and Users,
nowadays security incidents are more important than 10 years ago. As IntelMQ can be used as core element for automated security incident handling, we would like to provide a way to share information with other intelmq instances. This proposal is also an alternative to IEP03 insofar as solving the "multiple values" is possible by using UUIDs so "link" related events in a backwards-compatible manner.
If you're interested, please let us know, so we could organize a hackathon for further discussions about the specification of the meta-information. Previously this idea was discussed in [0] and [1].
[0] https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectur... [1] https://github.com/certtools/intelmq/issues/1521 # IEP04: Internal Data Format: Meta Information and Data Exchange To ease data exchange between two or more IntelMQ instances, adding some meta-information to the events can make this sharing easier in certain regards. "Linking" events could be based on the same theory as `git` using it
- with parent hashes ( we would call it UUID ).
### TL;DR Communication between one or more IntelMQ instances & exchange data with a backwards-compatible format. P2P or centralized architecture is a big topic, which has to be discussed after the format is being set.
### Why is metadata important? Short and simple. To avoid race conditions & being able to discard/drop already processed events from other instances.
### Meta information Metadata is used to transfer some general data, which is not likely related to the event itself. It's more or less just an information to keep events clear & sortable.
A message could look like:
{ "meta": { "version": 1, # protocol version, so we are allowed to fallback to old versions too "uuid": { current: "cert_at:aaaa-bbbb-cccc-dddd" # format to be decided parent: "cert_at:xxxx-yyyy-zzzz-ffff" # format to be discussed, if not set -> current is the parent uuid }, "type": "event", "format": "intelmq", # i. e. this field could contain "n6" or "idea", so the receiving component can decode on demand. }, "payload": { # normal intelmq data "source.ip": "127.0.0.1", "source.fqdn": "example.com", "raw": base64-blob } }
Tell us your opinion about adding non-standardized meta-information fields ( i. e. RTIR ticket number, origin, other local contact informationen ... and so on )
#### The UUID For the UUID there are multiple options:
- Generate a random 128 bit UUID
- A list of entities, which dealt with this event already. For
example if an event was passed on from cert-at to cert-ee, the field could look like `!cert-at!cert-ee`. A message sending loop can be detected if the own name is already in this field upon reception. 3. Using CyCat: `publisher-short-name:project-short-name:UUID`. For example: `cert-at:intelmq:72ddb00c-2d0a-4eea-b7ac-ae122b8e6c3b`, or `cert-pl:n6:f60c9fb9-81f9-4e0b-8a44-ea41326a15b3`. Some more research and discussion is required before the implementation of this option. Have a look at https://www.cycat.org/services/concept/ for more details. 4. A hash: A benefit using a hash is that we're able to recalculate them on every intelmq instance.
### Exporting events to other systems In IntelMQ 2.x the events only comprise of the "payload" and no meta information. For local storages like file output or databases, the meta information may not be relevant in some use-cases. So it needs to be possible to export events *without* meta information, which is also the backwards-compatible behaviour.
The "type" field exists in the current format as "__type" in the flat payload structure. In the output bots there's currently a boolean parameter `message_with_type` to include the field `__type` in the "export". For optionally exporting meta-information like uuid or format, a similar logic could be used.
### How can data exchange work? This now depends on how IntelMQ instances can communicate, either Peer-to-peer or via a central data hub. Both of them do have pro's and con's.
#### P2P ( Peer 2 Peer ) Decentralized network
- Less downtimes: A downtime of one instance, does not affect the
whole network
- Better privacy: data is not shared to an unrelated instance
- More secure: data can optionally be encrypted (key-exchange between
instances?)
- Decentralized and local maintenance
~ Network latency depends on server locations
- Networking issues may occur
How would data exchange looks like between two instances:
- Instance A has events which should be relayed to Instance B & C,
because they're not sure who the actually receiver should be 2) Instance A ensures all messages have a UUID 3) Instance A sends the data to Instance B & Instance C 4) Instance B checks the data & they're sure that the data should be for Instance C 5) Instance C receives data from Instance A & Instance B 6) Instance C checks the UUID, which is the same & drops the package from Instance B
#### (Central) Data hub
- Less maintenance: Is maintained by the hub administrator
- Central data storage (reports can optionally be cached to be
downloaded later) ~ Central data analysis (e.g. statistics) is possible ~ Network latency depends on server locations
- point of failure: if network problems occur, no exchange is possible
As already seen above, data exchange here would be less complicated. The sending may look like:
- Instance A has events which should be relayed to Instance B (e.g.
different country) 2) Instance A ensures all messages have a UUID 3) Instance A sends these messages to the data hub
The reception side can look like:
- Instance B connects to central instance
- Instance B queries and downloads all available messages
- Upon reception, all messages are de-duplicated based on the UUID:
a) If the UUID is already known, discard the message b) If the UUID has not been seen before, continue with processing
To sum up, both exchange variants are useful. More research is needed, i. e. a mixed infrastructure with centralized parts but can be decentralized too. However, this shall not be neither the purpose nor the aim of this IEP.
Yes, it does require a bit of a different mindset for STIX representation. It's less a single system event = a single serialised record, and more a package of entities which relate to each other in a graph style.
For the example of an open port / exploitable service (using https://docs.oasis-open.org/cti/stix/v2.1/cs01/stix-v2.1-cs01.html for reference):
* Open port / vulnerable service * Define an `infrastructure` entity to represent the vulnerable host. * Define a `tool` entity to represent RDP server * Define a `vulnerability` entity to represent the specific vulnerability (e.g. CVE #) * Bind them together with references as a directed graph: infrastructure `hosts` tool, tool `has` vulnerability
By design, it's subject to interpretation since STIX is agnostic about the software consuming it.
If the scope of this feature is strictly IntelMQ <-> IntelMQ data exchange for the foreseeable future, you might consider something that more directly serialises IntelMQ records. In that case, there's probably still value looking at the STIX spec for additional concepts and ideas from a thoroughly debated taxonomy.
Best regards,
Chris
On 31/03/2021 9:00 pm, Sebastian Wagner wrote:
Hi Chris,
Thanks for the input.
I looked at STIX/TAXII 2 some while ago and also have been in contact with the creators back then. IIRC and according to what the creators said, STIX/TAXII cannot be used for some/most of the data we are processing in IntelMQ. For example, how do you represent an open port (vulnerable service), an infected device or a malicious website? I don't see any STIX Object listed on the page linked by you, that could match for that kind of data.
kind regards Sebastian
On 3/31/21 5:43 AM, Chris Horsley wrote:
Before going too far down this road, I'd be looking at the suitability or adaptability of STIX / TAXII 2.1 (https://oasis-open.github.io/cti-documentation/stix/intro).
The STIX steering committee has spent years iterating and debating the data model for STIX. They've already done a lot of the hard work on how entities should reference one another, how TLP is implemented, consistent taxonomies, appropriate metadata and so on. There's also Python libs available, so it's more a case of working out how to integrate rather than reinvent.
TAXII provides a HTTP-based transport layer for STIX (or other data formats) which you can operate via push, pull, or otherwise relay via some sort of chained series of TAXII servers.
As a bonus, it would give sharing inter-operation between IntelMQ and other platforms which also implement STIX / TAXII. MISP is one of those (https://www.misp-project.org/2020/06/24/MISP.2.4.128.released.html) so there'd be some good experiences to draw from their developers I feel.
Best regards,
Chris
On 31/03/2021 2:56 am, Sebastian Waldbauer wrote:
Dear IntelMQ Developers and Users,
nowadays security incidents are more important than 10 years ago. As IntelMQ can be used as core element for automated security incident handling, we would like to provide a way to share information with other intelmq instances. This proposal is also an alternative to IEP03 insofar as solving the "multiple values" is possible by using UUIDs so "link" related events in a backwards-compatible manner.
If you're interested, please let us know, so we could organize a hackathon for further discussions about the specification of the meta-information. Previously this idea was discussed in [0] and [1].
[0] https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectur... [1] https://github.com/certtools/intelmq/issues/1521 # IEP04: Internal Data Format: Meta Information and Data Exchange To ease data exchange between two or more IntelMQ instances, adding some meta-information to the events can make this sharing easier in certain regards. "Linking" events could be based on the same theory as `git` using it
- with parent hashes ( we would call it UUID ).
### TL;DR Communication between one or more IntelMQ instances & exchange data with a backwards-compatible format. P2P or centralized architecture is a big topic, which has to be discussed after the format is being set.
### Why is metadata important? Short and simple. To avoid race conditions & being able to discard/drop already processed events from other instances.
### Meta information Metadata is used to transfer some general data, which is not likely related to the event itself. It's more or less just an information to keep events clear & sortable.
A message could look like:
{ "meta": { "version": 1, # protocol version, so we are allowed to fallback to old versions too "uuid": { current: "cert_at:aaaa-bbbb-cccc-dddd" # format to be decided parent: "cert_at:xxxx-yyyy-zzzz-ffff" # format to be discussed, if not set -> current is the parent uuid }, "type": "event", "format": "intelmq", # i. e. this field could contain "n6" or "idea", so the receiving component can decode on demand. }, "payload": { # normal intelmq data "source.ip": "127.0.0.1", "source.fqdn": "example.com", "raw": base64-blob } }
Tell us your opinion about adding non-standardized meta-information fields ( i. e. RTIR ticket number, origin, other local contact informationen ... and so on )
#### The UUID For the UUID there are multiple options:
- Generate a random 128 bit UUID
- A list of entities, which dealt with this event already. For
example if an event was passed on from cert-at to cert-ee, the field could look like `!cert-at!cert-ee`. A message sending loop can be detected if the own name is already in this field upon reception. 3. Using CyCat: `publisher-short-name:project-short-name:UUID`. For example: `cert-at:intelmq:72ddb00c-2d0a-4eea-b7ac-ae122b8e6c3b`, or `cert-pl:n6:f60c9fb9-81f9-4e0b-8a44-ea41326a15b3`. Some more research and discussion is required before the implementation of this option. Have a look at https://www.cycat.org/services/concept/ for more details. 4. A hash: A benefit using a hash is that we're able to recalculate them on every intelmq instance.
### Exporting events to other systems In IntelMQ 2.x the events only comprise of the "payload" and no meta information. For local storages like file output or databases, the meta information may not be relevant in some use-cases. So it needs to be possible to export events *without* meta information, which is also the backwards-compatible behaviour.
The "type" field exists in the current format as "__type" in the flat payload structure. In the output bots there's currently a boolean parameter `message_with_type` to include the field `__type` in the "export". For optionally exporting meta-information like uuid or format, a similar logic could be used.
### How can data exchange work? This now depends on how IntelMQ instances can communicate, either Peer-to-peer or via a central data hub. Both of them do have pro's and con's.
#### P2P ( Peer 2 Peer ) Decentralized network
- Less downtimes: A downtime of one instance, does not affect the
whole network
- Better privacy: data is not shared to an unrelated instance
- More secure: data can optionally be encrypted (key-exchange
between instances?)
- Decentralized and local maintenance
~ Network latency depends on server locations
- Networking issues may occur
How would data exchange looks like between two instances:
- Instance A has events which should be relayed to Instance B & C,
because they're not sure who the actually receiver should be 2) Instance A ensures all messages have a UUID 3) Instance A sends the data to Instance B & Instance C 4) Instance B checks the data & they're sure that the data should be for Instance C 5) Instance C receives data from Instance A & Instance B 6) Instance C checks the UUID, which is the same & drops the package from Instance B
#### (Central) Data hub
- Less maintenance: Is maintained by the hub administrator
- Central data storage (reports can optionally be cached to be
downloaded later) ~ Central data analysis (e.g. statistics) is possible ~ Network latency depends on server locations
- point of failure: if network problems occur, no exchange is possible
As already seen above, data exchange here would be less complicated. The sending may look like:
- Instance A has events which should be relayed to Instance B (e.g.
different country) 2) Instance A ensures all messages have a UUID 3) Instance A sends these messages to the data hub
The reception side can look like:
- Instance B connects to central instance
- Instance B queries and downloads all available messages
- Upon reception, all messages are de-duplicated based on the UUID:
a) If the UUID is already known, discard the message b) If the UUID has not been seen before, continue with processing
To sum up, both exchange variants are useful. More research is needed, i. e. a mixed infrastructure with centralized parts but can be decentralized too. However, this shall not be neither the purpose nor the aim of this IEP.
-- // Sebastian Wagnerwagner@cert.at - T: +43 1 5056416 7201 // CERT Austria -https://www.cert.at/ // Eine Initiative der nic.at GmbH -https://www.nic.at/ // Firmenbuchnummer 172568b, LG Salzburg
Hi Chris and list,
the problems I see with adopting STIX are: 1) it does not quite fit the needs IMHO 2) most users over here (Europe) went a different route. 3) even DHS which was pushing STIX a lot in private mentioned they went MISP 4) (similar to (1)) - IntelMQ is a different beast than a graph of objects all linked to each other. It's for the lower levels of automation. So, again, it does not quite fit the needs 5) Yes, we should be able to import and export STIX (as MISP does), but .. that's a controlled hand over point 6) STIX is too bloated. It allows sooo many different interpretations that it comes to being not clear in what it actually wants to convey. For IntelMQ we need super clear data which can be used to automatically trigger actions (for example on firewalls, in SIEMs/IR etc).
That being said, with the IEP 03 and 04 I see a risk of bloat for IntelMQ as well. More on that in a separate mail.
On 06.04.2021, at 09:47, Chris Horsley chris.horsley@csirtfoundry.com wrote:
Yes, it does require a bit of a different mindset for STIX representation. It's less a single system event = a single serialised record, and more a package of entities which relate to each other in a graph style.
For the example of an open port / exploitable service (using https://docs.oasis-open.org/cti/stix/v2.1/cs01/stix-v2.1-cs01.html for reference):
- Open port / vulnerable service
- Define an `infrastructure` entity to represent the vulnerable host.
- Define a `tool` entity to represent RDP server
- Define a `vulnerability` entity to represent the specific vulnerability (e.g. CVE #)
- Bind them together with references as a directed graph: infrastructure `hosts` tool, tool `has` vulnerability
By design, it's subject to interpretation since STIX is agnostic about the software consuming it.
If the scope of this feature is strictly IntelMQ <-> IntelMQ data exchange for the foreseeable future, you might consider something that more directly serialises IntelMQ records. In that case, there's probably still value looking at the STIX spec for additional concepts and ideas from a thoroughly debated taxonomy.
Best regards,
Chris
On 31/03/2021 9:00 pm, Sebastian Wagner wrote:
Hi Chris,
Thanks for the input.
I looked at STIX/TAXII 2 some while ago and also have been in contact with the creators back then. IIRC and according to what the creators said, STIX/TAXII cannot be used for some/most of the data we are processing in IntelMQ. For example, how do you represent an open port (vulnerable service), an infected device or a malicious website? I don't see any STIX Object listed on the page linked by you, that could match for that kind of data.
kind regards Sebastian
On 3/31/21 5:43 AM, Chris Horsley wrote:
Before going too far down this road, I'd be looking at the suitability or adaptability of STIX / TAXII 2.1 (https://oasis-open.github.io/cti-documentation/stix/intro).
The STIX steering committee has spent years iterating and debating the data model for STIX. They've already done a lot of the hard work on how entities should reference one another, how TLP is implemented, consistent taxonomies, appropriate metadata and so on. There's also Python libs available, so it's more a case of working out how to integrate rather than reinvent.
TAXII provides a HTTP-based transport layer for STIX (or other data formats) which you can operate via push, pull, or otherwise relay via some sort of chained series of TAXII servers.
As a bonus, it would give sharing inter-operation between IntelMQ and other platforms which also implement STIX / TAXII. MISP is one of those (https://www.misp-project.org/2020/06/24/MISP.2.4.128.released.html) so there'd be some good experiences to draw from their developers I feel.
Best regards,
Chris
On 31/03/2021 2:56 am, Sebastian Waldbauer wrote:
Dear IntelMQ Developers and Users,
nowadays security incidents are more important than 10 years ago. As IntelMQ can be used as core element for automated security incident handling, we would like to provide a way to share information with other intelmq instances. This proposal is also an alternative to IEP03 insofar as solving the "multiple values" is possible by using UUIDs so "link" related events in a backwards-compatible manner.
If you're interested, please let us know, so we could organize a hackathon for further discussions about the specification of the meta-information. Previously this idea was discussed in [0] and [1].
[0] https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectur... [1] https://github.com/certtools/intelmq/issues/1521 # IEP04: Internal Data Format: Meta Information and Data Exchange To ease data exchange between two or more IntelMQ instances, adding some meta-information to the events can make this sharing easier in certain regards. "Linking" events could be based on the same theory as `git` using it - with parent hashes ( we would call it UUID ).
### TL;DR Communication between one or more IntelMQ instances & exchange data with a backwards-compatible format. P2P or centralized architecture is a big topic, which has to be discussed after the format is being set.
### Why is metadata important? Short and simple. To avoid race conditions & being able to discard/drop already processed events from other instances.
### Meta information Metadata is used to transfer some general data, which is not likely related to the event itself. It's more or less just an information to keep events clear & sortable.
A message could look like:
{ "meta": { "version": 1, # protocol version, so we are allowed to fallback to old versions too "uuid": { current: "cert_at:aaaa-bbbb-cccc-dddd" # format to be decided parent: "cert_at:xxxx-yyyy-zzzz-ffff" # format to be discussed, if not set -> current is the parent uuid }, "type": "event", "format": "intelmq", # i. e. this field could contain "n6" or "idea", so the receiving component can decode on demand. }, "payload": { # normal intelmq data "source.ip": "127.0.0.1", "source.fqdn": "example.com", "raw": base64-blob } }
Tell us your opinion about adding non-standardized meta-information fields ( i. e. RTIR ticket number, origin, other local contact informationen ... and so on )
#### The UUID For the UUID there are multiple options:
- Generate a random 128 bit UUID
- A list of entities, which dealt with this event already. For example if an event was passed on from cert-at to cert-ee, the field could look like `!cert-at!cert-ee`. A message sending loop can be detected if the own name is already in this field upon reception.
- Using CyCat: `publisher-short-name:project-short-name:UUID`. For example: `cert-at:intelmq:72ddb00c-2d0a-4eea-b7ac-ae122b8e6c3b`, or `cert-pl:n6:f60c9fb9-81f9-4e0b-8a44-ea41326a15b3`. Some more research and discussion is required before the implementation of this option. Have a look at https://www.cycat.org/services/concept/ for more details.
- A hash: A benefit using a hash is that we're able to recalculate them on every intelmq instance.
### Exporting events to other systems In IntelMQ 2.x the events only comprise of the "payload" and no meta information. For local storages like file output or databases, the meta information may not be relevant in some use-cases. So it needs to be possible to export events *without* meta information, which is also the backwards-compatible behaviour.
The "type" field exists in the current format as "__type" in the flat payload structure. In the output bots there's currently a boolean parameter `message_with_type` to include the field `__type` in the "export". For optionally exporting meta-information like uuid or format, a similar logic could be used.
### How can data exchange work? This now depends on how IntelMQ instances can communicate, either Peer-to-peer or via a central data hub. Both of them do have pro's and con's.
#### P2P ( Peer 2 Peer ) Decentralized network
- Less downtimes: A downtime of one instance, does not affect the whole network
- Better privacy: data is not shared to an unrelated instance
- More secure: data can optionally be encrypted (key-exchange between instances?)
- Decentralized and local maintenance
~ Network latency depends on server locations
- Networking issues may occur
How would data exchange looks like between two instances:
- Instance A has events which should be relayed to Instance B & C, because they're not sure who the actually receiver should be
- Instance A ensures all messages have a UUID
- Instance A sends the data to Instance B & Instance C
- Instance B checks the data & they're sure that the data should be for Instance C
- Instance C receives data from Instance A & Instance B
- Instance C checks the UUID, which is the same & drops the package from Instance B
#### (Central) Data hub
- Less maintenance: Is maintained by the hub administrator
- Central data storage (reports can optionally be cached to be downloaded later)
~ Central data analysis (e.g. statistics) is possible ~ Network latency depends on server locations
- point of failure: if network problems occur, no exchange is possible
As already seen above, data exchange here would be less complicated. The sending may look like:
- Instance A has events which should be relayed to Instance B (e.g. different country)
- Instance A ensures all messages have a UUID
- Instance A sends these messages to the data hub
The reception side can look like:
- Instance B connects to central instance
- Instance B queries and downloads all available messages
- Upon reception, all messages are de-duplicated based on the UUID:
a) If the UUID is already known, discard the message b) If the UUID has not been seen before, continue with processing
To sum up, both exchange variants are useful. More research is needed, i. e. a mixed infrastructure with centralized parts but can be decentralized too. However, this shall not be neither the purpose nor the aim of this IEP.
-- // Sebastian Wagner wagner@cert.at
- T: +43 1 5056416 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg
IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/
Hi Aaron,
These are all valid criticisms. The strength and weakness of STIX is that is because it is platform-agnostic, it does indeed allow for interpretation depending on what type of system you're ingesting it into and your end goal. FWIW, here in Australia STIX is seeing more interest for inter-org sharing at the moment, particularly between commercial orgs and government. On the other hand, lots of the technical IR community here uses MISP sharing (especially when you don't have 100k+ USD for a commercial TIP).
Considering the good points you've made, I agree with you that STIX is not the best candidate for an inter-IntelMQ record serialisation format. However, there are probably some good bits to steal^H^H^H be inspired by like inter-entity references, TLP levels, internationalisation, confidence levels, taxonomy, reporting back sightings of IoCs (if that's something collectively wanted).
Best regards,
Chris
On 14/04/2021 10:48 pm, L. Aaron Kaplan wrote:
Hi Chris and list,
the problems I see with adopting STIX are:
- it does not quite fit the needs IMHO
- most users over here (Europe) went a different route.
- even DHS which was pushing STIX a lot in private mentioned they went MISP
- (similar to (1)) - IntelMQ is a different beast than a graph of objects all linked to each other. It's for the lower levels of automation. So, again, it does not quite fit the needs
- Yes, we should be able to import and export STIX (as MISP does), but .. that's a controlled hand over point
- STIX is too bloated. It allows sooo many different interpretations that it comes to being not clear in what it actually wants to convey. For IntelMQ we need super clear data which can be used to automatically trigger actions (for example on firewalls, in SIEMs/IR etc).
That being said, with the IEP 03 and 04 I see a risk of bloat for IntelMQ as well. More on that in a separate mail.
On 06.04.2021, at 09:47, Chris Horsley chris.horsley@csirtfoundry.com wrote:
Yes, it does require a bit of a different mindset for STIX representation. It's less a single system event = a single serialised record, and more a package of entities which relate to each other in a graph style.
For the example of an open port / exploitable service (using https://docs.oasis-open.org/cti/stix/v2.1/cs01/stix-v2.1-cs01.html for reference):
- Open port / vulnerable service
- Define an `infrastructure` entity to represent the vulnerable host.
- Define a `tool` entity to represent RDP server
- Define a `vulnerability` entity to represent the specific vulnerability (e.g. CVE #)
- Bind them together with references as a directed graph: infrastructure `hosts` tool, tool `has` vulnerability
By design, it's subject to interpretation since STIX is agnostic about the software consuming it.
If the scope of this feature is strictly IntelMQ <-> IntelMQ data exchange for the foreseeable future, you might consider something that more directly serialises IntelMQ records. In that case, there's probably still value looking at the STIX spec for additional concepts and ideas from a thoroughly debated taxonomy.
Best regards,
Chris
On 31/03/2021 9:00 pm, Sebastian Wagner wrote:
Hi Chris,
Thanks for the input.
I looked at STIX/TAXII 2 some while ago and also have been in contact with the creators back then. IIRC and according to what the creators said, STIX/TAXII cannot be used for some/most of the data we are processing in IntelMQ. For example, how do you represent an open port (vulnerable service), an infected device or a malicious website? I don't see any STIX Object listed on the page linked by you, that could match for that kind of data.
kind regards Sebastian
On 3/31/21 5:43 AM, Chris Horsley wrote:
Before going too far down this road, I'd be looking at the suitability or adaptability of STIX / TAXII 2.1 (https://oasis-open.github.io/cti-documentation/stix/intro).
The STIX steering committee has spent years iterating and debating the data model for STIX. They've already done a lot of the hard work on how entities should reference one another, how TLP is implemented, consistent taxonomies, appropriate metadata and so on. There's also Python libs available, so it's more a case of working out how to integrate rather than reinvent.
TAXII provides a HTTP-based transport layer for STIX (or other data formats) which you can operate via push, pull, or otherwise relay via some sort of chained series of TAXII servers.
As a bonus, it would give sharing inter-operation between IntelMQ and other platforms which also implement STIX / TAXII. MISP is one of those (https://www.misp-project.org/2020/06/24/MISP.2.4.128.released.html) so there'd be some good experiences to draw from their developers I feel.
Best regards,
Chris
On 31/03/2021 2:56 am, Sebastian Waldbauer wrote:
Dear IntelMQ Developers and Users,
nowadays security incidents are more important than 10 years ago. As IntelMQ can be used as core element for automated security incident handling, we would like to provide a way to share information with other intelmq instances. This proposal is also an alternative to IEP03 insofar as solving the "multiple values" is possible by using UUIDs so "link" related events in a backwards-compatible manner.
If you're interested, please let us know, so we could organize a hackathon for further discussions about the specification of the meta-information. Previously this idea was discussed in [0] and [1].
[0] https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectur... [1] https://github.com/certtools/intelmq/issues/1521 # IEP04: Internal Data Format: Meta Information and Data Exchange To ease data exchange between two or more IntelMQ instances, adding some meta-information to the events can make this sharing easier in certain regards. "Linking" events could be based on the same theory as `git` using it - with parent hashes ( we would call it UUID ).
### TL;DR Communication between one or more IntelMQ instances & exchange data with a backwards-compatible format. P2P or centralized architecture is a big topic, which has to be discussed after the format is being set.
### Why is metadata important? Short and simple. To avoid race conditions & being able to discard/drop already processed events from other instances.
### Meta information Metadata is used to transfer some general data, which is not likely related to the event itself. It's more or less just an information to keep events clear & sortable.
A message could look like:
{ "meta": { "version": 1, # protocol version, so we are allowed to fallback to old versions too "uuid": { current: "cert_at:aaaa-bbbb-cccc-dddd" # format to be decided parent: "cert_at:xxxx-yyyy-zzzz-ffff" # format to be discussed, if not set -> current is the parent uuid }, "type": "event", "format": "intelmq", # i. e. this field could contain "n6" or "idea", so the receiving component can decode on demand. }, "payload": { # normal intelmq data "source.ip": "127.0.0.1", "source.fqdn": "example.com", "raw": base64-blob } }
Tell us your opinion about adding non-standardized meta-information fields ( i. e. RTIR ticket number, origin, other local contact informationen ... and so on )
#### The UUID For the UUID there are multiple options:
- Generate a random 128 bit UUID
- A list of entities, which dealt with this event already. For example if an event was passed on from cert-at to cert-ee, the field could look like `!cert-at!cert-ee`. A message sending loop can be detected if the own name is already in this field upon reception.
- Using CyCat: `publisher-short-name:project-short-name:UUID`. For example: `cert-at:intelmq:72ddb00c-2d0a-4eea-b7ac-ae122b8e6c3b`, or `cert-pl:n6:f60c9fb9-81f9-4e0b-8a44-ea41326a15b3`. Some more research and discussion is required before the implementation of this option. Have a look at https://www.cycat.org/services/concept/ for more details.
- A hash: A benefit using a hash is that we're able to recalculate them on every intelmq instance.
### Exporting events to other systems In IntelMQ 2.x the events only comprise of the "payload" and no meta information. For local storages like file output or databases, the meta information may not be relevant in some use-cases. So it needs to be possible to export events *without* meta information, which is also the backwards-compatible behaviour.
The "type" field exists in the current format as "__type" in the flat payload structure. In the output bots there's currently a boolean parameter `message_with_type` to include the field `__type` in the "export". For optionally exporting meta-information like uuid or format, a similar logic could be used.
### How can data exchange work? This now depends on how IntelMQ instances can communicate, either Peer-to-peer or via a central data hub. Both of them do have pro's and con's.
#### P2P ( Peer 2 Peer ) Decentralized network
- Less downtimes: A downtime of one instance, does not affect the whole network
- Better privacy: data is not shared to an unrelated instance
- More secure: data can optionally be encrypted (key-exchange between instances?)
- Decentralized and local maintenance
~ Network latency depends on server locations
- Networking issues may occur
How would data exchange looks like between two instances:
- Instance A has events which should be relayed to Instance B & C, because they're not sure who the actually receiver should be
- Instance A ensures all messages have a UUID
- Instance A sends the data to Instance B & Instance C
- Instance B checks the data & they're sure that the data should be for Instance C
- Instance C receives data from Instance A & Instance B
- Instance C checks the UUID, which is the same & drops the package from Instance B
#### (Central) Data hub
- Less maintenance: Is maintained by the hub administrator
- Central data storage (reports can optionally be cached to be downloaded later)
~ Central data analysis (e.g. statistics) is possible ~ Network latency depends on server locations
- point of failure: if network problems occur, no exchange is possible
As already seen above, data exchange here would be less complicated. The sending may look like:
- Instance A has events which should be relayed to Instance B (e.g. different country)
- Instance A ensures all messages have a UUID
- Instance A sends these messages to the data hub
The reception side can look like:
- Instance B connects to central instance
- Instance B queries and downloads all available messages
- Upon reception, all messages are de-duplicated based on the UUID: a) If the UUID is already known, discard the message b) If the UUID has not been seen before, continue with processing
To sum up, both exchange variants are useful. More research is needed, i. e. a mixed infrastructure with centralized parts but can be decentralized too. However, this shall not be neither the purpose nor the aim of this IEP.
-- // Sebastian Wagner wagner@cert.at
- T: +43 1 5056416 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg
IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/
From: Sebastian Waldbauer waldbauer@cert.at, Date: bře 30, 2021
nowadays security incidents are more important than 10 years ago. As IntelMQ can be used as core element for automated security incident handling, we would like to provide a way to share information with other intelmq instances. This proposal is also an alternative to IEP03 insofar as solving the "multiple values" is possible by using UUIDs so "link" related events in a backwards-compatible manner.
Hello,
couple of notes (as Idea author).
We decided to not go for linking as the main means to allow multiple IPs/hostnames, as it works only for source:target in 1:1, 1:N, M:1 cases. 1:1 in current state of affairs in IntelMQ, 1:N is for example scan or bruteforce coming from one machine to many, M:1 is for example DDoS to one specific target. An then there is M:N - for example detectors, which (based on netflow statistics) detect DDoS, but with no explicit connection information - so you have information about traffic from M sources, going to N targets. In world, where you have only 1:1 mapping events and linking, you end up with cartesian product (which is not what you want :) ), or two linked events - one with only sources and no targets and second with only targets and no sources (which is arguably clumsy).
Second use case - deduplicating in case of distribution circles - is easy if everyone uses the same format or passes the IDs (whatever they are, just reasonably unique, UUID is fine). However, problem arises with external sources (which is currently the main source of information in IntelMQ). Consider: organisation A gets event from Shadowserver into IntelMQ, which recasts it as IntelMQ format and ads arbitrary ID. Organisation B does the same. Organisation C, which gets them both, with two distinct IDs, is unable to deterministically decide, whether event is duplicate, or just coincidence. No clear idea of solution here, maybe stable set of "external source" identificators (for Shadowserver, Shodan, ...) plus stable ID/hash generated deterministically from important fields... (as you mentioned, some CyCat application?)
-- Pavel Kácha, CESNET
Dear IntelMQ Developers and Users
In today's hackathon we discussed IEP04 in detail and the proposal was generally adopted. To cover additional use-cases the UUIDs will be extended to also cover certain kind of relations between events by referring to them using UUIDs. The exact details of this format will be discussed in the next days, on intelmq-dev or on GitHub (https://github.com/certtools/ieps).
Thanks - again ;) - everybody for the feedback!
kind regards Sebastian
On 3/30/21 5:56 PM, Sebastian Waldbauer wrote:
Dear IntelMQ Developers and Users,
nowadays security incidents are more important than 10 years ago. As IntelMQ can be used as core element for automated security incident handling, we would like to provide a way to share information with other intelmq instances. This proposal is also an alternative to IEP03 insofar as solving the "multiple values" is possible by using UUIDs so "link" related events in a backwards-compatible manner.
If you're interested, please let us know, so we could organize a hackathon for further discussions about the specification of the meta-information. Previously this idea was discussed in [0] and [1].
[0] https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectur... [1] https://github.com/certtools/intelmq/issues/1521 # IEP04: Internal Data Format: Meta Information and Data Exchange To ease data exchange between two or more IntelMQ instances, adding some meta-information to the events can make this sharing easier in certain regards. "Linking" events could be based on the same theory as `git` using it - with parent hashes ( we would call it UUID ).
### TL;DR Communication between one or more IntelMQ instances & exchange data with a backwards-compatible format. P2P or centralized architecture is a big topic, which has to be discussed after the format is being set.
### Why is metadata important? Short and simple. To avoid race conditions & being able to discard/drop already processed events from other instances.
### Meta information Metadata is used to transfer some general data, which is not likely related to the event itself. It's more or less just an information to keep events clear & sortable.
A message could look like:
{ "meta": { "version": 1, # protocol version, so we are allowed to fallback to old versions too "uuid": { current: "cert_at:aaaa-bbbb-cccc-dddd" # format to be decided parent: "cert_at:xxxx-yyyy-zzzz-ffff" # format to be discussed, if not set -> current is the parent uuid }, "type": "event", "format": "intelmq", # i. e. this field could contain "n6" or "idea", so the receiving component can decode on demand. }, "payload": { # normal intelmq data "source.ip": "127.0.0.1", "source.fqdn": "example.com", "raw": base64-blob } }
Tell us your opinion about adding non-standardized meta-information fields ( i. e. RTIR ticket number, origin, other local contact informationen ... and so on )
#### The UUID For the UUID there are multiple options:
- Generate a random 128 bit UUID
- A list of entities, which dealt with this event already. For
example if an event was passed on from cert-at to cert-ee, the field could look like `!cert-at!cert-ee`. A message sending loop can be detected if the own name is already in this field upon reception. 3. Using CyCat: `publisher-short-name:project-short-name:UUID`. For example: `cert-at:intelmq:72ddb00c-2d0a-4eea-b7ac-ae122b8e6c3b`, or `cert-pl:n6:f60c9fb9-81f9-4e0b-8a44-ea41326a15b3`. Some more research and discussion is required before the implementation of this option. Have a look at https://www.cycat.org/services/concept/ for more details. 4. A hash: A benefit using a hash is that we're able to recalculate them on every intelmq instance.
### Exporting events to other systems In IntelMQ 2.x the events only comprise of the "payload" and no meta information. For local storages like file output or databases, the meta information may not be relevant in some use-cases. So it needs to be possible to export events *without* meta information, which is also the backwards-compatible behaviour.
The "type" field exists in the current format as "__type" in the flat payload structure. In the output bots there's currently a boolean parameter `message_with_type` to include the field `__type` in the "export". For optionally exporting meta-information like uuid or format, a similar logic could be used.
### How can data exchange work? This now depends on how IntelMQ instances can communicate, either Peer-to-peer or via a central data hub. Both of them do have pro's and con's.
#### P2P ( Peer 2 Peer ) Decentralized network
- Less downtimes: A downtime of one instance, does not affect the
whole network
- Better privacy: data is not shared to an unrelated instance
- More secure: data can optionally be encrypted (key-exchange between
instances?)
- Decentralized and local maintenance
~ Network latency depends on server locations
- Networking issues may occur
How would data exchange looks like between two instances:
- Instance A has events which should be relayed to Instance B & C,
because they're not sure who the actually receiver should be 2) Instance A ensures all messages have a UUID 3) Instance A sends the data to Instance B & Instance C 4) Instance B checks the data & they're sure that the data should be for Instance C 5) Instance C receives data from Instance A & Instance B 6) Instance C checks the UUID, which is the same & drops the package from Instance B
#### (Central) Data hub
- Less maintenance: Is maintained by the hub administrator
- Central data storage (reports can optionally be cached to be
downloaded later) ~ Central data analysis (e.g. statistics) is possible ~ Network latency depends on server locations
- point of failure: if network problems occur, no exchange is possible
As already seen above, data exchange here would be less complicated. The sending may look like:
- Instance A has events which should be relayed to Instance B (e.g.
different country) 2) Instance A ensures all messages have a UUID 3) Instance A sends these messages to the data hub
The reception side can look like:
- Instance B connects to central instance
- Instance B queries and downloads all available messages
- Upon reception, all messages are de-duplicated based on the UUID:
a) If the UUID is already known, discard the message b) If the UUID has not been seen before, continue with processing
To sum up, both exchange variants are useful. More research is needed, i. e. a mixed infrastructure with centralized parts but can be decentralized too. However, this shall not be neither the purpose nor the aim of this IEP.
Hi,
thoughts about https://github.com/certtools/ieps/tree/main/004
== which events are equal?
As this is about an exchange format between IntelMQ instances, someone could define how a hash about the event data is calculated easily (as it is the identical code everywhere). This is the same as defining what equality means.
This way no "universally unique identifier" needs to be invented or transfered. Thereby avoiding the danger that the same events gets several fresh random ids, because of race conditions. (Example two IntelMQ instances have the same feed and both receive the same event before having talked about it.)
(If you actually end up use a hash, don't call it UUID. :) ) BTW: The concept of hierarchy (like the hash trees in SCMs) is not entirely clear to me. Is this about one instance stating that it has seen this part of meta data from the other instance?
== Do instances trust each other fully?
Shouldn't a concept about event exchange include a consideration of trust of the instances? While I believe there are very good relations between many CERT organisations, the trust of instances they or others may run is not endless. (Example: An IntelMQ server gets compromised, e.g. by an previously unknown hardware defect and the attackers want to obstruct the network. They enter bad metadata and may want to achieve that some CERTs do not get some events. Okay, far fetched.)
In my imagination it makes sense that each instance will have their own set of sources and this may have a different piece of info than the others (like a restricted national feed) and may only like to share a part of this info.
Regards, Bernhard ps.: Thanks for putting the IEPs up with markdown rendering, reads much better. :)
Hi,
On 4/22/21 4:56 PM, Bernhard Reiter wrote:
== which events are equal?
As this is about an exchange format between IntelMQ instances, someone could define how a hash about the event data is calculated easily (as it is the identical code everywhere). This is the same as defining what equality means.
This way no "universally unique identifier" needs to be invented or transfered. Thereby avoiding the danger that the same events gets several fresh random ids, because of race conditions. (Example two IntelMQ instances have the same feed and both receive the same event before having talked about it.)
I'm afraid that this may be hard to achieve, but it would definitely be an advantage. What we agreed on, is that having a static identifier (not changing with the content) solves the use-case to represent inter-event relations (links, also instead of IEP003).
IMO the development of such a hash is worth the effort, but as part of a separate IEP.
(If you actually end up use a hash, don't call it UUID. :) )
Of course :)
BTW: The concept of hierarchy (like the hash trees in SCMs) is not entirely clear to me. Is this about one instance stating that it has seen this part of meta data from the other instance?
For this part, participants of the hackathon presented several use-cases and ideas, so I leave the floor to them to explain them with examples. This is also the part which needs more discussion/specification now.
== Do instances trust each other fully?
Shouldn't a concept about event exchange include a consideration of trust of the instances? While I believe there are very good relations between many CERT organisations, the trust of instances they or others may run is not endless. (Example: An IntelMQ server gets compromised, e.g. by an previously unknown hardware defect and the attackers want to obstruct the network. They enter bad metadata and may want to achieve that some CERTs do not get some events. Okay, far fetched.)
In my imagination it makes sense that each instance will have their own set of sources and this may have a different piece of info than the others (like a restricted national feed) and may only like to share a part of this info.
Sure. It's always up to the administrator to define what will be collected and what will be share to whom. IEP004 is *not* about sharing data (un-)conditionally, it does not even define a transmission layer/protocol. IEP004 is only one (small) part to make cross-instance data sharing easier. The thoughts about trust are good, but I'd like to not solve that problem in that IEP but rather keep the focus on the meta-information.
You're like a never-ending spring of good ideas :)
kind regards Sebastian
From: Bernhard Reiter bernhard@intevation.de, Date: dub 22, 2021
== Do instances trust each other fully?
Shouldn't a concept about event exchange include a consideration of trust of the instances? While I believe there are very good relations between many CERT organisations, the trust of instances they or others may run is not endless. (Example: An IntelMQ server gets compromised, e.g. by an previously unknown hardware defect and the attackers want to obstruct the network. They enter bad metadata and may want to achieve that some CERTs do not get some events. Okay, far fetched.)
In my imagination it makes sense that each instance will have their own set of sources and this may have a different piece of info than the others (like a restricted national feed) and may only like to share a part of this info.
There are multiple facets of trust in this field, all with their own possible set of solutions and can of worms. :)
1, How do we trust the detection method or external source of the data? (Aka possible ratio of false positives or malfunction.)
2, How do we trust the fellow peer org for the data they produce? (Similar to 1 in fact.)
3, How do we trust the fellow peer org for the data they transfer/relay? (Here we might end up delving into signing the data, or even partial signatures, and all the related PKI stuff.)
4, How do we trust the fellow peer org it will not disclose information we have send there if we do not want to? (Aka honoring the TLP.)
-- Pavel
Dear developer colleagues,
While the hackathon's conclusion to go with IEP04 was clear, we still need to define the exact format of the meta-information, which we intentionally not decided upon in the meeting last week. Please provide your input on these two topics: - the UUID topic: https://github.com/certtools/ieps/issues/1 - the format/type topic: https://github.com/certtools/ieps/issues/3
Thanks in advance Sebastian
On 4/22/21 3:24 PM, Sebastian Wagner wrote:
Dear IntelMQ Developers and Users
In today's hackathon we discussed IEP04 in detail and the proposal was generally adopted. To cover additional use-cases the UUIDs will be extended to also cover certain kind of relations between events by referring to them using UUIDs. The exact details of this format will be discussed in the next days, on intelmq-dev or on GitHub (https://github.com/certtools/ieps).
Thanks - again ;) - everybody for the feedback!
kind regards Sebastian
On 3/30/21 5:56 PM, Sebastian Waldbauer wrote:
Dear IntelMQ Developers and Users,
nowadays security incidents are more important than 10 years ago. As IntelMQ can be used as core element for automated security incident handling, we would like to provide a way to share information with other intelmq instances. This proposal is also an alternative to IEP03 insofar as solving the "multiple values" is possible by using UUIDs so "link" related events in a backwards-compatible manner.
If you're interested, please let us know, so we could organize a hackathon for further discussions about the specification of the meta-information. Previously this idea was discussed in [0] and [1].
[0] https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectur... [1] https://github.com/certtools/intelmq/issues/1521 # IEP04: Internal Data Format: Meta Information and Data Exchange To ease data exchange between two or more IntelMQ instances, adding some meta-information to the events can make this sharing easier in certain regards. "Linking" events could be based on the same theory as `git` using it
- with parent hashes ( we would call it UUID ).
### TL;DR Communication between one or more IntelMQ instances & exchange data with a backwards-compatible format. P2P or centralized architecture is a big topic, which has to be discussed after the format is being set.
### Why is metadata important? Short and simple. To avoid race conditions & being able to discard/drop already processed events from other instances.
### Meta information Metadata is used to transfer some general data, which is not likely related to the event itself. It's more or less just an information to keep events clear & sortable.
A message could look like:
{ "meta": { "version": 1, # protocol version, so we are allowed to fallback to old versions too "uuid": { current: "cert_at:aaaa-bbbb-cccc-dddd" # format to be decided parent: "cert_at:xxxx-yyyy-zzzz-ffff" # format to be discussed, if not set -> current is the parent uuid }, "type": "event", "format": "intelmq", # i. e. this field could contain "n6" or "idea", so the receiving component can decode on demand. }, "payload": { # normal intelmq data "source.ip": "127.0.0.1", "source.fqdn": "example.com", "raw": base64-blob } }
Tell us your opinion about adding non-standardized meta-information fields ( i. e. RTIR ticket number, origin, other local contact informationen ... and so on )
#### The UUID For the UUID there are multiple options:
- Generate a random 128 bit UUID
- A list of entities, which dealt with this event already. For
example if an event was passed on from cert-at to cert-ee, the field could look like `!cert-at!cert-ee`. A message sending loop can be detected if the own name is already in this field upon reception. 3. Using CyCat: `publisher-short-name:project-short-name:UUID`. For example: `cert-at:intelmq:72ddb00c-2d0a-4eea-b7ac-ae122b8e6c3b`, or `cert-pl:n6:f60c9fb9-81f9-4e0b-8a44-ea41326a15b3`. Some more research and discussion is required before the implementation of this option. Have a look at https://www.cycat.org/services/concept/ for more details. 4. A hash: A benefit using a hash is that we're able to recalculate them on every intelmq instance.
### Exporting events to other systems In IntelMQ 2.x the events only comprise of the "payload" and no meta information. For local storages like file output or databases, the meta information may not be relevant in some use-cases. So it needs to be possible to export events *without* meta information, which is also the backwards-compatible behaviour.
The "type" field exists in the current format as "__type" in the flat payload structure. In the output bots there's currently a boolean parameter `message_with_type` to include the field `__type` in the "export". For optionally exporting meta-information like uuid or format, a similar logic could be used.
### How can data exchange work? This now depends on how IntelMQ instances can communicate, either Peer-to-peer or via a central data hub. Both of them do have pro's and con's.
#### P2P ( Peer 2 Peer ) Decentralized network
- Less downtimes: A downtime of one instance, does not affect the
whole network
- Better privacy: data is not shared to an unrelated instance
- More secure: data can optionally be encrypted (key-exchange between
instances?)
- Decentralized and local maintenance
~ Network latency depends on server locations
- Networking issues may occur
How would data exchange looks like between two instances:
- Instance A has events which should be relayed to Instance B & C,
because they're not sure who the actually receiver should be 2) Instance A ensures all messages have a UUID 3) Instance A sends the data to Instance B & Instance C 4) Instance B checks the data & they're sure that the data should be for Instance C 5) Instance C receives data from Instance A & Instance B 6) Instance C checks the UUID, which is the same & drops the package from Instance B
#### (Central) Data hub
- Less maintenance: Is maintained by the hub administrator
- Central data storage (reports can optionally be cached to be
downloaded later) ~ Central data analysis (e.g. statistics) is possible ~ Network latency depends on server locations
- point of failure: if network problems occur, no exchange is possible
As already seen above, data exchange here would be less complicated. The sending may look like:
- Instance A has events which should be relayed to Instance B (e.g.
different country) 2) Instance A ensures all messages have a UUID 3) Instance A sends these messages to the data hub
The reception side can look like:
- Instance B connects to central instance
- Instance B queries and downloads all available messages
- Upon reception, all messages are de-duplicated based on the UUID:
a) If the UUID is already known, discard the message b) If the UUID has not been seen before, continue with processing
To sum up, both exchange variants are useful. More research is needed, i. e. a mixed infrastructure with centralized parts but can be decentralized too. However, this shall not be neither the purpose nor the aim of this IEP.
-- // Sebastian Wagner wagner@cert.at - T: +43 1 5056416 7201 // CERT Austria - https://www.cert.at/ // Eine Initiative der nic.at GmbH - https://www.nic.at/ // Firmenbuchnummer 172568b, LG Salzburg
Hello,
at the hackaton we decided to try to document some real use-cases that might be covered by IntelMQ. I guess this is still bit of a design phase and does not fit into one of the issues at GitHub, so I'll try to kick it off here for discussion (and decisions of what IntelMQ does and does not want to support, or what to consider and what to scratch).
1, Events with multiple target IPs/hostnames/ports
- Horizontal portscan (multiple machines, one port) - SSH bruteforce (multiple machines, one port/service) - Vertical portscan (one machine, multiple ports)
2, Events with multiple source IPs/hostames/ports
- Targeted DDoS (mutiple machines/reflectors shoot at one target)
3, Events with both multiple sources and targets
- Wider DDoS (multiple machines/reflectors shoot at mutiple machines, whole subnet, etc.)
4, Events with one or more both sources and targets, where exact pattern is not known
- Aka one of [1, 2, 3], but we do not have complete information about specific connections made, possibly because the event/detection came from the statistical detector or from some form of aggregation (where original full information from for example netflow is already lost).
I guess these (1-4) initiated creation of IEP03 and IEP04 and probably are the only ones worth considering now.
Taking into account the possibility of linking of events, there might be other orthogonal use-cases:
5, Identification of identical events from possibly the same source to avoid duplication/circles
- aka some form of stable identifier
6, When target organisation contacts source organisation for more info, identification of where event came from internally
- aka possibility to put there the internal (opaque) identifier, like CESNET-RT#2235 (Request tracker), or Idea:UUID (what Idea event was converted into this IntelMQ event)
7, Meta-events
- event, linking together multiple completely different events as one incident (email address of spammer from spam email, IPs of spamming mailservers, phishing URL from spam email)
8, Correlated events
- aka different events, but identified as related/part of other events (like ongoing attack)
9, Modification or deletion/withdrawal of information
- aka "this event replaces that event with new info", or "that event was wrong, sent by error, forget it"
All above are ones we considered in Idea (see ID, AltNames, CorrelID, AggrID, PredID, RelID at [1], and not yet implemented GroupID), and I personally consider:
- 1-4, maybe 5 quite important, - 6 handy (nice to have) - 7-9 - we incorporated possibility of these, but in fact never used
I believe pretty much all are solvable by linking of events (IEP04):
1, 2, 3 as bunch of linked events with source-target relation in each of them 4 as two linked events - one with all the sources, one with all the targets
5 as additional calculated identifier, hard part is not storage, but standardization/calculation 6 as additional opaque (freehand, non UUID) identifiers 7, 8 as bunch of linked events, with possibility of some meta-event maybe 9 as additional type of link
-- Pavel
Dear Pavel,
Thank you very much for this great composition of possible use-cases. May I integrate them into the current IEP04 text?
Sebastian
On 4/29/21 3:40 PM, Pavel Kácha wrote:
Hello,
at the hackaton we decided to try to document some real use-cases that might be covered by IntelMQ. I guess this is still bit of a design phase and does not fit into one of the issues at GitHub, so I'll try to kick it off here for discussion (and decisions of what IntelMQ does and does not want to support, or what to consider and what to scratch).
1, Events with multiple target IPs/hostnames/ports
- Horizontal portscan (multiple machines, one port)
- SSH bruteforce (multiple machines, one port/service)
- Vertical portscan (one machine, multiple ports)
2, Events with multiple source IPs/hostames/ports
- Targeted DDoS (mutiple machines/reflectors shoot at one target)
3, Events with both multiple sources and targets
- Wider DDoS (multiple machines/reflectors shoot at mutiple machines, whole subnet, etc.)
4, Events with one or more both sources and targets, where exact pattern is not known
- Aka one of [1, 2, 3], but we do not have complete information about specific connections made, possibly because the event/detection came from the statistical detector or from some form of aggregation (where original full information from for example netflow is already lost).
I guess these (1-4) initiated creation of IEP03 and IEP04 and probably are the only ones worth considering now.
Taking into account the possibility of linking of events, there might be other orthogonal use-cases:
5, Identification of identical events from possibly the same source to avoid duplication/circles
- aka some form of stable identifier
6, When target organisation contacts source organisation for more info, identification of where event came from internally
- aka possibility to put there the internal (opaque) identifier, like CESNET-RT#2235 (Request tracker), or Idea:UUID (what Idea event was converted into this IntelMQ event)
7, Meta-events
- event, linking together multiple completely different events as one incident (email address of spammer from spam email, IPs of spamming mailservers, phishing URL from spam email)
8, Correlated events
- aka different events, but identified as related/part of other events (like ongoing attack)
9, Modification or deletion/withdrawal of information
- aka "this event replaces that event with new info", or "that event was wrong, sent by error, forget it"
All above are ones we considered in Idea (see ID, AltNames, CorrelID, AggrID, PredID, RelID at [1], and not yet implemented GroupID), and I personally consider:
- 1-4, maybe 5 quite important,
- 6 handy (nice to have)
- 7-9 - we incorporated possibility of these, but in fact never used
I believe pretty much all are solvable by linking of events (IEP04):
1, 2, 3 as bunch of linked events with source-target relation in each of them 4 as two linked events - one with all the sources, one with all the targets
5 as additional calculated identifier, hard part is not storage, but standardization/calculation 6 as additional opaque (freehand, non UUID) identifiers 7, 8 as bunch of linked events, with possibility of some meta-event maybe 9 as additional type of link
-- Pavel
[1] https://idea.cesnet.cz/en/definition
IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/
Sure! Use as you wish/need.
-- Pavel
From: Sebastian Wagner wagner@cert.at, Date: kvě 03, 2021
Dear Pavel,
Thank you very much for this great composition of possible use-cases. May I integrate them into the current IEP04 text?
Sebastian
On 4/29/21 3:40 PM, Pavel Kácha wrote:
Hello,
at the hackaton we decided to try to document some real use-cases that
might be covered by IntelMQ. I guess this is still bit of a design phase and does not fit into one of the issues at GitHub, so I'll try to kick it off here for discussion (and decisions of what IntelMQ does and does not want to support, or what to consider and what to scratch).
1, Events with multiple target IPs/hostnames/ports
- Horizontal portscan (multiple machines, one port)
- SSH bruteforce (multiple machines, one port/service)
- Vertical portscan (one machine, multiple ports)
2, Events with multiple source IPs/hostames/ports
- Targeted DDoS (mutiple machines/reflectors shoot at one target)
3, Events with both multiple sources and targets
- Wider DDoS (multiple machines/reflectors shoot at mutiple machines, whole subnet, etc.)
4, Events with one or more both sources and targets, where exact pattern is not known
- Aka one of [1, 2, 3], but we do not have complete information about specific connections made, possibly because the event/detection came from the statistical detector or from some form of aggregation (where original full information from for example netflow is already lost).
I guess these (1-4) initiated creation of IEP03 and IEP04 and probably
are the only ones worth considering now.
Taking into account the possibility of linking of events, there might be
other orthogonal use-cases:
5, Identification of identical events from possibly the same source to avoid duplication/circles
- aka some form of stable identifier
6, When target organisation contacts source organisation for more info, identification of where event came from internally
- aka possibility to put there the internal (opaque) identifier, like CESNET-RT#2235 (Request tracker), or Idea:UUID (what Idea event was converted into this IntelMQ event)
7, Meta-events
- event, linking together multiple completely different events as one incident (email address of spammer from spam email, IPs of spamming mailservers, phishing URL from spam email)
8, Correlated events
- aka different events, but identified as related/part of other events (like ongoing attack)
9, Modification or deletion/withdrawal of information
- aka "this event replaces that event with new info", or "that event was wrong, sent by error, forget it"
All above are ones we considered in Idea (see ID, AltNames, CorrelID,
AggrID, PredID, RelID at [1], and not yet implemented GroupID), and I personally consider:
- 1-4, maybe 5 quite important,
- 6 handy (nice to have)
- 7-9 - we incorporated possibility of these, but in fact never used
I believe pretty much all are solvable by linking of events (IEP04):
1, 2, 3 as bunch of linked events with source-target relation in each of them 4 as two linked events - one with all the sources, one with all the targets
5 as additional calculated identifier, hard part is not storage, but standardization/calculation 6 as additional opaque (freehand, non UUID) identifiers 7, 8 as bunch of linked events, with possibility of some meta-event maybe 9 as additional type of link
-- Pavel
[1] [1]https://idea.cesnet.cz/en/definition
IntelMQ-dev mailing list [2]https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev [3]https://intelmq.readthedocs.io/
-- // Sebastian Wagner [4]wagner@cert.at - T: +43 676 898 298 7201 // CERT Austria - [5]https://www.cert.at/ // Eine Initiative der nic.at GmbH - [6]https://www.nic.at/ // Firmenbuchnummer 172568b, LG Salzburg
References
Visible links
Hi,
I fully agree that use-cases 1-4 can be solved by adding linking information to IntelMQ messages.
Regarding the use-cases 5+:
Use-case 5 is of course very interesting, but I'd like to not discuss that one as part of IEP04. Defining such an algorithm/hash for identifying similar or almost-identical events would take some time and deserves its own proper discussion and IEP.
6 (internal identifier) is just a freeform text, so easy to implement it needed.
7 (Meta-events) and 9 (Modification or deletion/withdrawal of information) would introduce a different type of message, and I think it is - in the first place - not related to Meta-information of events.
8 (Correlated events) is either a meta-event or a simple link (1-4)
Cheers, Sebastian
On 4/29/21 3:40 PM, Pavel Kácha wrote:
Hello,
at the hackaton we decided to try to document some real use-cases that might be covered by IntelMQ. I guess this is still bit of a design phase and does not fit into one of the issues at GitHub, so I'll try to kick it off here for discussion (and decisions of what IntelMQ does and does not want to support, or what to consider and what to scratch).
1, Events with multiple target IPs/hostnames/ports
- Horizontal portscan (multiple machines, one port)
- SSH bruteforce (multiple machines, one port/service)
- Vertical portscan (one machine, multiple ports)
2, Events with multiple source IPs/hostames/ports
- Targeted DDoS (mutiple machines/reflectors shoot at one target)
3, Events with both multiple sources and targets
- Wider DDoS (multiple machines/reflectors shoot at mutiple machines, whole subnet, etc.)
4, Events with one or more both sources and targets, where exact pattern is not known
- Aka one of [1, 2, 3], but we do not have complete information about specific connections made, possibly because the event/detection came from the statistical detector or from some form of aggregation (where original full information from for example netflow is already lost).
I guess these (1-4) initiated creation of IEP03 and IEP04 and probably are the only ones worth considering now.
Taking into account the possibility of linking of events, there might be other orthogonal use-cases:
5, Identification of identical events from possibly the same source to avoid duplication/circles
- aka some form of stable identifier
6, When target organisation contacts source organisation for more info, identification of where event came from internally
- aka possibility to put there the internal (opaque) identifier, like CESNET-RT#2235 (Request tracker), or Idea:UUID (what Idea event was converted into this IntelMQ event)
7, Meta-events
- event, linking together multiple completely different events as one incident (email address of spammer from spam email, IPs of spamming mailservers, phishing URL from spam email)
8, Correlated events
- aka different events, but identified as related/part of other events (like ongoing attack)
9, Modification or deletion/withdrawal of information
- aka "this event replaces that event with new info", or "that event was wrong, sent by error, forget it"
All above are ones we considered in Idea (see ID, AltNames, CorrelID, AggrID, PredID, RelID at [1], and not yet implemented GroupID), and I personally consider:
- 1-4, maybe 5 quite important,
- 6 handy (nice to have)
- 7-9 - we incorporated possibility of these, but in fact never used
I believe pretty much all are solvable by linking of events (IEP04):
1, 2, 3 as bunch of linked events with source-target relation in each of them 4 as two linked events - one with all the sources, one with all the targets
5 as additional calculated identifier, hard part is not storage, but standardization/calculation 6 as additional opaque (freehand, non UUID) identifiers 7, 8 as bunch of linked events, with possibility of some meta-event maybe 9 as additional type of link
-- Pavel
[1] https://idea.cesnet.cz/en/definition
IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/
Given that we have until yet not reached a consensus on the exact format changes and that IntelMQ 3.0 is approaching - in fact we want to do the (first) release candidate end of May - I propose to postpone the implementation of IEP03 to after IntelMQ 3.0, maybe 3.1. Implementing a major format change in a rush only causes troubles.
We have at lease two open discussions: - How to store linking information using UUIDs: https://github.com/certtools/ieps/issues/1 - And the specification of the format and type fields: https://github.com/certtools/ieps/issues/3
Again I call all contributors and users to participate in the discussion, on the mailing list or on GitHub.
Sebastian
On 4/22/21 3:24 PM, Sebastian Wagner wrote:
Dear IntelMQ Developers and Users
In today's hackathon we discussed IEP04 in detail and the proposal was generally adopted. To cover additional use-cases the UUIDs will be extended to also cover certain kind of relations between events by referring to them using UUIDs. The exact details of this format will be discussed in the next days, on intelmq-dev or on GitHub (https://github.com/certtools/ieps).
Thanks - again ;) - everybody for the feedback!
kind regards Sebastian
On 3/30/21 5:56 PM, Sebastian Waldbauer wrote:
Dear IntelMQ Developers and Users,
nowadays security incidents are more important than 10 years ago. As IntelMQ can be used as core element for automated security incident handling, we would like to provide a way to share information with other intelmq instances. This proposal is also an alternative to IEP03 insofar as solving the "multiple values" is possible by using UUIDs so "link" related events in a backwards-compatible manner.
If you're interested, please let us know, so we could organize a hackathon for further discussions about the specification of the meta-information. Previously this idea was discussed in [0] and [1].
[0] https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectur... [1] https://github.com/certtools/intelmq/issues/1521 # IEP04: Internal Data Format: Meta Information and Data Exchange To ease data exchange between two or more IntelMQ instances, adding some meta-information to the events can make this sharing easier in certain regards. "Linking" events could be based on the same theory as `git` using it
- with parent hashes ( we would call it UUID ).
### TL;DR Communication between one or more IntelMQ instances & exchange data with a backwards-compatible format. P2P or centralized architecture is a big topic, which has to be discussed after the format is being set.
### Why is metadata important? Short and simple. To avoid race conditions & being able to discard/drop already processed events from other instances.
### Meta information Metadata is used to transfer some general data, which is not likely related to the event itself. It's more or less just an information to keep events clear & sortable.
A message could look like:
{ "meta": { "version": 1, # protocol version, so we are allowed to fallback to old versions too "uuid": { current: "cert_at:aaaa-bbbb-cccc-dddd" # format to be decided parent: "cert_at:xxxx-yyyy-zzzz-ffff" # format to be discussed, if not set -> current is the parent uuid }, "type": "event", "format": "intelmq", # i. e. this field could contain "n6" or "idea", so the receiving component can decode on demand. }, "payload": { # normal intelmq data "source.ip": "127.0.0.1", "source.fqdn": "example.com", "raw": base64-blob } }
Tell us your opinion about adding non-standardized meta-information fields ( i. e. RTIR ticket number, origin, other local contact informationen ... and so on )
#### The UUID For the UUID there are multiple options:
- Generate a random 128 bit UUID
- A list of entities, which dealt with this event already. For
example if an event was passed on from cert-at to cert-ee, the field could look like `!cert-at!cert-ee`. A message sending loop can be detected if the own name is already in this field upon reception. 3. Using CyCat: `publisher-short-name:project-short-name:UUID`. For example: `cert-at:intelmq:72ddb00c-2d0a-4eea-b7ac-ae122b8e6c3b`, or `cert-pl:n6:f60c9fb9-81f9-4e0b-8a44-ea41326a15b3`. Some more research and discussion is required before the implementation of this option. Have a look at https://www.cycat.org/services/concept/ for more details. 4. A hash: A benefit using a hash is that we're able to recalculate them on every intelmq instance.
### Exporting events to other systems In IntelMQ 2.x the events only comprise of the "payload" and no meta information. For local storages like file output or databases, the meta information may not be relevant in some use-cases. So it needs to be possible to export events *without* meta information, which is also the backwards-compatible behaviour.
The "type" field exists in the current format as "__type" in the flat payload structure. In the output bots there's currently a boolean parameter `message_with_type` to include the field `__type` in the "export". For optionally exporting meta-information like uuid or format, a similar logic could be used.
### How can data exchange work? This now depends on how IntelMQ instances can communicate, either Peer-to-peer or via a central data hub. Both of them do have pro's and con's.
#### P2P ( Peer 2 Peer ) Decentralized network
- Less downtimes: A downtime of one instance, does not affect the
whole network
- Better privacy: data is not shared to an unrelated instance
- More secure: data can optionally be encrypted (key-exchange between
instances?)
- Decentralized and local maintenance
~ Network latency depends on server locations
- Networking issues may occur
How would data exchange looks like between two instances:
- Instance A has events which should be relayed to Instance B & C,
because they're not sure who the actually receiver should be 2) Instance A ensures all messages have a UUID 3) Instance A sends the data to Instance B & Instance C 4) Instance B checks the data & they're sure that the data should be for Instance C 5) Instance C receives data from Instance A & Instance B 6) Instance C checks the UUID, which is the same & drops the package from Instance B
#### (Central) Data hub
- Less maintenance: Is maintained by the hub administrator
- Central data storage (reports can optionally be cached to be
downloaded later) ~ Central data analysis (e.g. statistics) is possible ~ Network latency depends on server locations
- point of failure: if network problems occur, no exchange is possible
As already seen above, data exchange here would be less complicated. The sending may look like:
- Instance A has events which should be relayed to Instance B (e.g.
different country) 2) Instance A ensures all messages have a UUID 3) Instance A sends these messages to the data hub
The reception side can look like:
- Instance B connects to central instance
- Instance B queries and downloads all available messages
- Upon reception, all messages are de-duplicated based on the UUID:
a) If the UUID is already known, discard the message b) If the UUID has not been seen before, continue with processing
To sum up, both exchange variants are useful. More research is needed, i. e. a mixed infrastructure with centralized parts but can be decentralized too. However, this shall not be neither the purpose nor the aim of this IEP.
-- // Sebastian Wagner wagner@cert.at - T: +43 1 5056416 7201 // CERT Austria - https://www.cert.at/ // Eine Initiative der nic.at GmbH - https://www.nic.at/ // Firmenbuchnummer 172568b, LG Salzburg