Dear allies,
The discussion around the IEP04 proposal, adding meta-information to IntelMQ messages, has stalled over the last months - first because of the time-intensive IntelMQ 3.0 release preparations and then because of the vacation season.
Here is the current proposal: https://github.com/certtools/ieps/tree/main/004#readme
Aaron, Sebastian Waldbauer and myself worked on it over the summer and also identified two open issues to be discussed: 1. The exact format of the meta-information and how to name and structure the fields. AIL made the first move and now uses a format similar to the previously proposed Variant "A". The IEP04 document contains the current proposal which is in line with the AIL format: https://github.com/certtools/ieps/tree/main/004#user-content-variant-ail If there are no other proposals, this will most probably the way to go. 2. The format of the UUID format which we want to uniquely identify IntelMQ events. We don't necessarily need to use the UUIDv4 format which represents pure randomness, but also other options which include the time and are even /time-sortable/. Sebastian Waldbauer analysed a couple of options and summarised his results in this document:
https://github.com/certtools/ieps/blob/main/004/UUID.md
Please let us know your opinion on the different UUID options.
cheers Sebastian
Dear Sebastian and all,
Thank you for your effort to provide better IntelMQ. I am trying to catch up the discussion (but still behind a lot...)
Regarding IEP004, I'd second the current proposal and Variant AIL. That is natural and easy to understand.
But don't we need to have a timestamp in the meta-data ? I mean something like this;
{ "format": "intelmq", "version": 1, "type": "event", "meta": { "intelmq:uuid": "<event-uuid-1>", "intelmq:uuid_org": "<org-uuid-1>", "intelmq:timestamp": "<creation time of this message>", <== here :
With this timestamp, we don't need to consider a time-sortable UUID but just use UUID-whatever.
If you've already discussed and decided not to have it, please ignore and receive my apology to rehash old discussion.
Thank you very much.
Best Regards,
Dear Moto,
First of all, thanks for providing feedback!
On 9/7/21 2:40 AM, moto kawasaki wrote:
Regarding IEP004, I'd second the current proposal and Variant AIL. That is natural and easy to understand.
Thanks.
But don't we need to have a timestamp in the meta-data ? I mean something like this;
{ "format": "intelmq", "version": 1, "type": "event", "meta": { "intelmq:uuid": "<event-uuid-1>", "intelmq:uuid_org": "<org-uuid-1>", "intelmq:timestamp": "<creation time of this message>", <== here :
Every IntelMQ message should already have a /time.source/ field in the payload, so I'm not sure if it's necessary to have it in the metadata as well explicitly. And that overlaps with the next topic:
With this timestamp, we don't need to consider a time-sortable UUID but just use UUID-whatever.
Not necessarily. Events are usually identified in User-Interfaces and databases by an ID, a numeric one or alphanumeric. I'm just thinking of MISP, which shows numeric IDs in the event lists. For IntelMQ similar interfaces exist (https://github.com/Intevation/intelmq-fody/) as well as plain databases. If the data is already automatically time-sortable by the primary identifier, the usability could benefit. In same cases the performance could increase as well.
If you've already discussed and decided not to have it, please ignore and receive my apology to rehash old discussion.
No, we haven't discussed that yet :)
best regards Sebastian
Dear Sebastian,
Thanks for your explanation!
There is no need to have timestamp in meta data if one exists at other place. I also understand the benefits of time-sortable uuid. I like it :-)
Thank you very much
Regards,
On 8/09/2021 1:34 am, Sebastian Wagner wrote:
But don't we need to have a timestamp in the meta-data ? I mean something like this;
{ "format": "intelmq", "version": 1, "type": "event", "meta": { "intelmq:uuid": "<event-uuid-1>", "intelmq:uuid_org": "<org-uuid-1>", "intelmq:timestamp": "<creation time of this message>", <== here :
Every IntelMQ message should already have a /time.source/ field in the payload, so I'm not sure if it's necessary to have it in the metadata as well explicitly. And that overlaps with the next topic:
Not specifically for IntelMQ, but I tend to break an event message into at least three timestamps (but possibly more depending on event type):
* actual occurrence time of reported security event (time.source as I'd understand it) * event package original creation time (the suggested meta.intelmq:timestamp here, which I'd possibly rename to meta.intelmq:creation_timestamp or similar) * event package system ingestion time (time.observation?)
Best regards,
Chris
From: Chris Horsley chris.horsley@csirtfoundry.com, Date: zář 08, 2021
Not specifically for IntelMQ, but I tend to break an event message into at least three timestamps (but possibly more depending on event type):
- actual occurrence time of reported security event (time.source as I'd
understand it)
- event package original creation time (the suggested meta.intelmq:timestamp
here, which I'd possibly rename to meta.intelmq:creation_timestamp or similar)
- event package system ingestion time (time.observation?)
Exactly, came to my mind also. Not to forget that timestamps are not exact - portscan or DDoS have their start and end timestamps, not just one.
If we want to have some type of timestamp within UUID, or at least within metadata, there should be deterministic definition of what is should be, and taking into account that some of that info would not be available (precise start or end of the attack, etc.), for example one of: * use always detection time * use always creation time * use first available: event start, detection window start, event end, detection window end, detection time, event creation time
I think that timestamp within metadata is important even if there are (possibly duplicate) timestamps within event data (payload) itself, if support for other formats (n6, Idea) is considered - some tools within the chain would not need to actually understand all of the formats, but only metadata.
And I second the opinion that if timestamp is already in the metadata, UUID4 is fine.
-- Pavel Kácha
On 9/8/21 4:15 PM, Pavel Kácha wrote:
Exactly, came to my mind also. Not to forget that timestamps are not exact - portscan or DDoS have their start and end timestamps, not just one.
If we want to have some type of timestamp within UUID, or at least within metadata, there should be deterministic definition of what is should be, and taking into account that some of that info would not be available (precise start or end of the attack, etc.), for example one of:
- use always detection time
- use always creation time
- use first available: event start, detection window start, event end, detection window end, detection time, event creation time
I think that timestamp within metadata is important even if there are (possibly duplicate) timestamps within event data (payload) itself, if support for other formats (n6, Idea) is considered - some tools within the chain would not need to actually understand all of the formats, but only metadata.
I think the third option best describes a useful behavior in IntelMQ (we have no fields for detection window and events start == time.source, creation time ~= time.observation) which is already used in some code parts. Putting six different timestamps in the metadata while not providing that data in the payload itself seems a bit awkward to me.
I also noticed that the AIL format itself doesn't provide a timestamp, I wonder why. Alex, why is that the case?
Sebastian
From: Sebastian Wagner wagner@cert.at, Date: zář 08, 2021
On 9/8/21 4:15 PM, Pavel Kácha wrote:
Exactly, came to my mind also. Not to forget that timestamps are not exact - portscan or DDoS have their start and end timestamps, not just one.
If we want to have some type of timestamp within UUID, or at least within metadata, there should be deterministic definition of what is should be, and taking into account that some of that info would not be available (precise start or end of the attack, etc.), for example one of:
- use always detection time
- use always creation time
- use first available: event start, detection window start, event end, detection window end, detection time, event creation time
I think that timestamp within metadata is important even if there are (possibly duplicate) timestamps within event data (payload) itself, if support for other formats (n6, Idea) is considered - some tools within the chain would not need to actually understand all of the formats, but only metadata.
I think the third option best describes a useful behavior in IntelMQ (we have no fields for detection window and events start == time.source, creation time ~= time.observation) which is already used in some code parts.
Putting six different timestamps in the metadata while not providing that data in the payload itself seems a bit awkward to me.
Definitely. To make myself clearer - I didn't mean to put all the timestamps into metadata, just the one, which would make best sense for the tools, which do not need to look into payload, and which would be clearly defined. My examples are just (some of the) possibilities of how to define it.
-- Pavel
On 9/9/21 9:40 AM, Pavel Kácha wrote:
Putting six different timestamps in the metadata while not providing that data in the payload itself seems a bit awkward to me.
Definitely. To make myself clearer - I didn't mean to put all the timestamps into metadata, just the one, which would make best sense for the tools, which do not need to look into payload, and which would be clearly defined. My examples are just (some of the) possibilities of how to define it.
We're on the same page, my main point is, that most of the possible timestamps don't (yet?) have their own fields in the payload and need to be saved in the extra namespace. If we'd use one of them for the timestamp in the metadata, we should first properly define them in the payload.
Sebastian
Hi,
On 9/8/21 7:13 AM, Chris Horsley wrote:
On 8/09/2021 1:34 am, Sebastian Wagner wrote:
But don't we need to have a timestamp in the meta-data ? I mean something like this;
{ "format": "intelmq", "version": 1, "type": "event", "meta": { "intelmq:uuid": "<event-uuid-1>", "intelmq:uuid_org": "<org-uuid-1>", "intelmq:timestamp": "<creation time of this message>", <== here :
Every IntelMQ message should already have a /time.source/ field in the payload, so I'm not sure if it's necessary to have it in the metadata as well explicitly. And that overlaps with the next topic:
Not specifically for IntelMQ, but I tend to break an event message into at least three timestamps (but possibly more depending on event type):
- actual occurrence time of reported security event (time.source as
I'd understand it)
- event package original creation time (the suggested
meta.intelmq:timestamp here, which I'd possibly rename to meta.intelmq:creation_timestamp or similar)
- event package system ingestion time (time.observation?)
Thinking again about the timestamps: Wouldn't it be better to (also?) put the date of the /message creation/ (and/or last updated) into the meta field? That would often be identical to the time.observation if the data is fetched from a feed*, and if the message is passed on from a different system - could also be from another organisation! - it is inherited.
On the other hand, I think that the meta-field should be as small as possible, because it adds a significant amount of relative data size to the event which has /enormous/ impacts on memory consumption. I don't want to specifically advocate time-based UUID formats, but using them would add the message creation time for no overhead costs.
As an aside: To keep the memory consumption and processing-times low, we could also implement a minimal-meta-mode, which only adds the meta-fields when the message is passed on to other systems and exclusively keeps the parts of meta which are absolutely necessary for inter-bot communication (in particular the type and event UUID)
cheers Sebastian
* not if the data comes from another system, e.g. mail or ticketing system, in that cases time.observation is the reception timestamp of the source
On 9/6/21 6:59 PM, Sebastian Wagner wrote:
[...]
- The exact format of the meta-information and how to name and
structure the fields. AIL made the first move and now uses a format similar to the previously proposed Variant "A". The IEP04 document contains the current proposal which is in line with the AIL format: https://github.com/certtools/ieps/tree/main/004#user-content-variant-ail [...]
As requested by Aaron I have split the AIL-based variant into two, the first is a mix of AIL and the original Variant A and the second is a combination with Variant B (using RDF to represent links).
cheers Sebastian