[IntelMQ-users] IEP04: IntelMQ Data Format - Meta-Information

Sebastian Wagner wagner at cert.at
Wed Mar 31 12:00:48 CEST 2021


Hi Chris,

Thanks for the input.

I looked at STIX/TAXII 2 some while ago and also have been in contact
with the creators back then. IIRC and according to what the creators
said, STIX/TAXII cannot be used for some/most of the data we are
processing in IntelMQ. For example, how do you represent an open port
(vulnerable service), an infected device or a malicious website? I don't
see any STIX Object listed on the page linked by you, that could match
for that kind of data.

kind regards
Sebastian

On 3/31/21 5:43 AM, Chris Horsley wrote:
>
> Before going too far down this road, I'd be looking at the suitability
> or adaptability of STIX / TAXII 2.1
> (https://oasis-open.github.io/cti-documentation/stix/intro).
>
> The STIX steering committee has spent years iterating and debating the
> data model for STIX. They've already done a lot of the hard work on
> how entities should reference one another, how TLP is implemented,
> consistent taxonomies, appropriate metadata and so on. There's also
> Python libs available, so it's more a case of working out how to
> integrate rather than reinvent.
>
> TAXII provides a HTTP-based transport layer for STIX (or other data
> formats) which you can operate via push, pull, or otherwise relay via
> some sort of chained series of TAXII servers.
>
> As a bonus, it would give sharing inter-operation between IntelMQ and
> other platforms which also implement STIX / TAXII. MISP is one of
> those
> (https://www.misp-project.org/2020/06/24/MISP.2.4.128.released.html)
> so there'd be some good experiences to draw from their developers I feel.
>
> Best regards,
>
> Chris
>
> On 31/03/2021 2:56 am, Sebastian Waldbauer wrote:
>> Dear IntelMQ Developers and Users,
>>
>> nowadays security incidents are more important than 10 years ago. As
>> IntelMQ can be used as core element for automated security incident
>> handling, we would like to provide a way to share information with
>> other intelmq instances. This proposal is also an alternative to
>> IEP03 insofar as solving the "multiple values" is possible by using
>> UUIDs so "link" related events in a backwards-compatible manner.
>>
>> If you're interested, please let us know, so we could organize a
>> hackathon for further discussions about the specification of the
>> meta-information.
>> Previously this idea was discussed in [0] and [1].
>>
>> [0]
>> https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architecture-3.0.md#user-content-general-requirements
>> [1] https://github.com/certtools/intelmq/issues/1521
>> # IEP04: Internal Data Format: Meta Information and Data Exchange
>> To ease data exchange between two or more IntelMQ instances, adding
>> some meta-information to the events can make this sharing easier in
>> certain regards.
>> "Linking" events could be based on the same theory as `git` using it
>> - with parent hashes ( we would call it UUID ).
>>
>> ### TL;DR
>> Communication between one or more IntelMQ instances & exchange data
>> with a backwards-compatible format. P2P or centralized architecture
>> is a big topic, which has to be discussed after the format is being set.
>>
>> ### Why is metadata important?
>> Short and simple. To avoid race conditions & being able to
>> discard/drop already processed events from other instances.
>>
>> ### Meta information
>> Metadata is used to transfer some general data, which is not likely
>> related to the event itself. It's more or less just an information to
>> keep events clear & sortable.
>>
>> A message could look like:
>>
>> {
>>     "meta": {
>>         "version": 1, # protocol version, so we are allowed to
>> fallback to old versions too
>>         "uuid": {
>>            current: "cert_at:aaaa-bbbb-cccc-dddd" # format to be decided
>>            parent: "cert_at:xxxx-yyyy-zzzz-ffff" # format to be
>> discussed, if not set -> current is the parent uuid
>>         },
>>         "type": "event",
>>         "format": "intelmq", # i. e. this field could contain "n6" or
>> "idea", so the receiving component can decode on demand.
>>     },
>>     "payload": { # normal intelmq data
>>         "source.ip": "127.0.0.1",
>>         "source.fqdn": "example.com",
>>         "raw": base64-blob
>>     }
>> }
>>
>> Tell us your opinion about adding non-standardized meta-information
>> fields ( i. e. RTIR ticket number, origin, other local contact
>> informationen ... and so on )
>>
>> #### The UUID
>> For the UUID there are multiple options:
>> 1. Generate a random 128 bit UUID
>> 2. A list of entities, which dealt with this event already. For
>> example if an event was passed on from cert-at to cert-ee, the field
>> could look like `!cert-at!cert-ee`. A message sending loop can be
>> detected if the own name is already in this field upon reception.
>> 3. Using CyCat: `publisher-short-name:project-short-name:UUID`. For
>> example: `cert-at:intelmq:72ddb00c-2d0a-4eea-b7ac-ae122b8e6c3b`, or
>> `cert-pl:n6:f60c9fb9-81f9-4e0b-8a44-ea41326a15b3`. Some more research
>> and discussion is required before the implementation of this option.
>> Have a look at https://www.cycat.org/services/concept/ for more details.
>> 4. A hash: A benefit using a hash is that we're able to recalculate
>> them on every intelmq instance.
>>
>> ### Exporting events to other systems
>> In IntelMQ 2.x the events only comprise of the "payload" and no meta
>> information. For local storages like file output or databases, the
>> meta information may not be relevant in some use-cases. So it needs
>> to be possible to export events *without* meta information, which is
>> also the backwards-compatible behaviour.
>>
>> The "type" field exists in the current format as "__type" in the flat
>> payload structure. In the output bots there's currently a boolean
>> parameter `message_with_type` to include the field `__type` in the
>> "export".
>> For optionally exporting meta-information like uuid or format, a
>> similar logic could be used.
>>
>> ### How can data exchange work?
>> This now depends on how IntelMQ instances can communicate, either
>> Peer-to-peer or via a central data hub. Both of them do have pro's
>> and con's.
>>
>> #### P2P ( Peer 2 Peer )
>> Decentralized network
>> + Less downtimes: A downtime of one instance, does not affect the
>> whole network
>> + Better privacy: data is not shared to an unrelated instance
>> + More secure: data can optionally be encrypted (key-exchange between
>> instances?)
>> + Decentralized and local maintenance
>> ~ Network latency depends on server locations
>> - Networking issues may occur
>>
>> How would data exchange looks like between two instances:
>> 1) Instance A has events which should be relayed to Instance B & C,
>> because they're not sure who the actually receiver should be
>> 2) Instance A ensures all messages have a UUID
>> 3) Instance A sends the data to Instance B & Instance C
>> 4) Instance B checks the data & they're sure that the data should be
>> for Instance C
>> 5) Instance C receives data from Instance A & Instance B
>> 6) Instance C checks the UUID, which is the same & drops the package
>> from Instance B
>>
>> #### (Central) Data hub
>> + Less maintenance: Is maintained by the hub administrator
>> + Central data storage (reports can optionally be cached to be
>> downloaded later)
>> ~ Central data analysis (e.g. statistics) is possible
>> ~ Network latency depends on server locations
>> - point of failure: if network problems occur, no exchange is possible
>>
>> As already seen above, data exchange here would be less complicated.
>> The sending may look like:
>> 1) Instance A has events which should be relayed to Instance B (e.g.
>> different country)
>> 2) Instance A ensures all messages have a UUID
>> 3) Instance A sends these messages to the data hub
>>
>> The reception side can look like:
>> 1) Instance B connects to central instance
>> 2) Instance B queries and downloads all available messages
>> 3) Upon reception, all messages are de-duplicated based on the UUID:
>>   a) If the UUID is already known, discard the message
>>   b) If the UUID has not been seen before, continue with processing
>>
>> To sum up, both exchange variants are useful. More research is
>> needed, i. e. a mixed infrastructure with centralized parts but can
>> be decentralized too. However, this shall not be neither the purpose
>> nor the aim of this IEP.
>>
>>
>
-- 
// Sebastian Wagner <wagner at cert.at> - T: +43 1 5056416 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cert.at/pipermail/intelmq-users/attachments/20210331/7f3fe3f4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.cert.at/pipermail/intelmq-users/attachments/20210331/7f3fe3f4/attachment.sig>


More information about the IntelMQ-users mailing list