Hi,
And: what use cases do we have?
My particular use case at the moment is to have lists of IP addresses, IP networks and possibly FQDN’s.
How to define the types of the values inside the list?
The values will be those that conform to IPAddress, IPNetwork and FQDN for their respective type. It could be represented as a vertical bar or comma separated list within a string or it could be a proper python
list.
How should the "API" look like
The API should function as a regular python list. That being said, I don’t imagine doing any complex operations with the list – I will have access to all the values within the parser and will be able to add them
all to the event at once.
When should the list be converted to a string (or maybe also a JSON-list)?
My main usage will be outputting the events to Mongo – in that case a JSON-list will work. But overall I am happy to use strings to represent the list for all outputs if it makes it easier. I can simply split
the values out after receiving the event on the other end.
My end use case is marking up the events as indicators in STIX. One of the teams most vital sources will have many source IPs/Networks/FQDNs per indicator, and thus I would like to be able to send a list of these
values as one event.
Regards,
Alex
From:
Sebastian Wagner [mailto:wagner@cert.at]
Sent: Wednesday, 8 November 2017 10:59 PM
To: Knight, Alexander; intelmq-dev@lists.cert.at
Subject: Re: [Intelmq-dev] Data Harmonization - Fields with multiple values
Hi,
On 11/03/2017 06:26 AM, Knight, Alexander wrote:
At the Deepsec conference Sebastian mentioned updating the harmonization to allow for fields with multiple values. Has this issue been progressed at all?
The use case was the field abuse_contact which could be a list and then be concatenated (if necessary) with commas.
Technically it is not hard to do it. In the develop branch I already have something similar (and more complex): a dictionary type named JSONDict.
So, not directly, but some changes that should make a change easier.
There are some questions popping up that need to be clarified first:
* How to define the types of the values inside the list? E.g. for the abuse_contact it has to be a list of strings/email addresses
* How should the "API" look like, or in other words: what should happen for the in and setitem-operations etc
* When should the list be converted to a string (or maybe also a JSON-list)? E.g. for postgres output the abuse_contact could either be a json-list or a comma separated list, depending on the table's definition, but for NoSQL-databases and files it can be just
the list itself.
And: what use cases do we have? That's good to know before thinking about how we implement that all:
We will require multiple values for some fields in our events,
What is in these fields? (type and/or example values) Where do you put that that and how do you want to work with in (inside intelmq)?
I'd like to hear opinions of other users and developers too!
Sebastian
P.S.: I do have specific ideas, but don't want to bias others ;)
--
// Sebastian Wagner <wagner@cert.at> - T: +43 1 5056416 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg