[Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732)

Pavel Kácha ph at cesnet.cz
Thu Feb 9 16:42:02 CET 2017


   Hi, as I see no replies, my 2 cents: If you definitely need more
different types of hashes for one content in 1.0 timeframe, where
multivalues are a hurdle, staying with explicit key for various hashes seems
a rational solution to me.  However, it may be wise do to a bit of digging
(maybe based on current supported set of parsers and their output?) for what
types of hashes are already in the wild and used, so users don't end up
resorting to something generic too soon.

Cheers
-- Pavel

> From: Tomás Lima <synchroack at protonmail.ch>, Date: Feb 01, 2017
>
>    Thank you Pavel for the excellent feedback.
>    Well, I really want to have option to specify sha1, sha256 and md5 in
>    same message since I'm planning to use results from sources like
>    VirusTotal which will be useful to correlate information like Aaron
>    mention on that issue:
>    "assume you are given a hash (sha1) of a piece of malware and you want to
>    find it in the events table. However, you only stored the md5 since that
>    is what you received even though the sender sent you both fields (sha1
>    and md4 - such as the n6 feed). Then you can not ever find the right
>    entry again."
>    I vote for:
>    {
>        ...
>        "malware.hash.md5": "<md5 hash>",
>        "malware.hash.sha1": "<sha1 hash>",
>        "malware.hash.sha256": "<sha256 hash>"
>        ...
>    }
>    instead of:
>    {
>        ...
>        "malware.hash": "md5:<md5 hash>,sha1:<sha1 hash>,sha256:<sha256
>    hash>"
>        ...
>    }
>    For me, we can evaluate this issue again when we start discuss Full-JSON
>    vs "single value" format on mileston v1.1, but for now, I would go on the
>    way that I showed.
>    What do you guys say?
> 
>      -------- Original Message --------
>      Subject: Re: [Intelmq-dev] IntelMQ Data Harmonization (DHO) -
>      malware.hash key (issue 732)
>      Local Time: February 1, 2017 10:13 AM
>      UTC Time: February 1, 2017 10:13 AM
>      From: ph at cesnet.cz
>      To: Tomás Lima <synchroack at protonmail.ch>
>      L. Aaron Kaplan <kaplan at cert.at>, intelmq-dev at lists.cert.at
>      Hi,
>      hmm, looking into uri/url/urn terms, I have used the wrong one, sorry
>      for
>      that. Because (from the name of the key in Idea) we know it is a hash,
>      we
>      do not use "urn:" (or "urn:hash") prefix to keep things simple, so in
>      reality it is URI.
>      So we use hash name directly as a URI scheme. As a scheme we use well
>      known hash names (md5, sha1), or reasonably descriptive name of unusual
>      hashes (btih for BitTorrent Info Hash, which can be encoded in various
>      ways), or just "hash:" as a fallback, where we do not know the internal
>      format of the hash (but we try not to overuse that and rather find out
>      some
>      more unique/descriptive identifier). Example:
>      Hash: [
>      "md5:5307d294b6ccd9854f2deed8c1628b72",
>      "sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL",
>      "btih:QHQXPYWMACKDWKP47RRVIV7VOURXFE5Q",
>      "passwd:$1$Jrbw4gbM$Er6MejOKAXqT.VTII8vGV%2F"
>      ]
>      So, to be precise, we do not try to overstandardize (various protocols
>      we
>      might encounter in the future may use various convoluted types of
>      hashes),
>      and in reality it is not even a real URI, because we do not register
>      the
>      prefixes at IANA.
>      But for our intents and purposes it looks like a URI, quacks like a
>      URI,
>      so we can keep things simple and work with that as with a URI. :)
>      Regarding lists/arrays - you have mentioned
>      https://github.com/certtools/intelmq/issues/394
>      so if I understand well, you are considering multivalues for 1.1. From
>      that point of view, I would consider leaving the hash as single value
>      for
>      1.0 - as multiple identifiers for one content happen, but rarely; and
>      you
>      are already keeping things simple with single source ip/single
>      destination
>      ip, for example (of course only in case there is not somebody who
>      _really_
>      needs the multivalue hash field).
>      Then in 1.1 you can work out list/multivalue issues and incorporate all
>      that in one go.
>      -- Pavel
>      P.S.: Maybe there are people here who don't know what I'm talking about
>      when referencing Idea, so here's what we are using:
>      https://idea.cesnet.cz/en/index
>      > From: Tomás Lima <synchroack at protonmail.ch>, Date: Feb 01, 2017
>      >
>      > Pavel, can you confirm in which "message" format are you currently
>      using
>      > URN? Is it like the following?
>      > {
>      >     "malware.hash": [
>      >         "urn:hash::md5:5307d294b6ccd9854f2deed8c1628b72",
>      >         "urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL"
>      >     ]
>      > }
>      > The current issue on IntelMQ is the fact that we cannot add list as
>      > values but if it was that case, the proposal would suit perfectly.
>      > Aaron and Sebastian, I see one issue:
>      > If I have a bot like Virustotal which use malware.hash field to query
>      the
>      > API, how should I create the bot since the hashes are in one field
>      comma
>      > separated as you guys mentioned...? IMHO we should add one field per
>      each
>      > hash type (it does not change so quickly) and before the next hash
>      type,
>      > I expect that we as community have a final answer about Full-JSON
>      support
>      > or "single value" formats like we are currently supporting.
>      > I think it's time to decide regarding the malware keys. How do you
>      guys
>      > want to proceed in the end?
>      >
>      > -------- Original Message --------
>      > Subject: Re: [Intelmq-dev] IntelMQ Data Harmonization (DHO) -
>      > malware.hash key (issue 732)
>      > Local Time: January 17, 2017 11:08 AM
>      > UTC Time: January 17, 2017 11:08 AM
>      > From: kaplan at cert.at
>      > To: Sebastian Wagner <wagner at cert.at>
>      > intelmq-dev at lists.cert.at
>      > > On 11 Jan 2017, at 10:36, Sebastian Wagner <wagner at cert.at> wrote:
>      > >
>      > > I also think that adding one field per hash type is not feasible as
>      > > there are a lot of hash types and they change over time. That's why
>      > we
>      > > used malware.hash and the Crypt (C) names.
>      > > I wasn't aware of URN at this time and it is definitely better -
>      > easier
>      > > to understand and supports more hash types. Consequently
>      malware.hash
>      > > needs to be a list (could be made comma separated for postgres?).
>      > >
>      > agreed
>      > > Sebastian
>      > >
>      > >
>      > > On 01/06/2017 09:47 AM, Pavel Kácha wrote:
>      > >> Hi,
>      > >>
>      > >> again, just speaking based on our experience - in a year or two
>      > there
>      > >> will be another set of popular hashes, and you will probably start
>      > >> considering adding another explicit keys (malware.hash.newone) -
>      > requiring
>      > >> changing the harmonization in the process.
>      > >> We have also found out that types hashes of hashes, which are not
>      in
>      > >> standard format, but have their own intrinsic unextractable
>      > properties,
>      > >> appear over the time. This could validate adding its own "name",
>      for
>      > >> example bittorrent BTIH hash.
>      > >> We also thought that hash type is part of information, and thus
>      > should be
>      > >> part of data field, not key name.
>      > >> So, we have just used one key, using solely URN namespace for
>      adding
>      > new
>      > >> hash types.
>      > >>
>      > >> (It is also necessary to say that one contents can be identified
>      by
>      > more
>      > >> hashes, so you may find out over time that just single scalar
>      field
>      > may not
>      > >> be enough. But I digress here. :) )
>      > >>
>      > >> Cheers
>      > >> -- Pavel
>      > >
>      > > --
>      > > // Sebastian Wagner <wagner at cert.at> - T: +43 1 50564167201
>      > > // CERT Austria - http://www.cert.at/
>      > > // Eine Initiative der nic.at GmbH - http://www.nic.at/
>      > > // Firmenbuchnummer 172568b, LG Salzburg
>      > >
>      > >
>      > > _______________________________________________
>      > > Intelmq-dev mailing list
>      > > Intelmq-dev at lists.cert.at
>      > > http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev
>      > --
>      > // CERT Austria
>      > // L. Aaron Kaplan <kaplan at cert.at>
>      > // T: +43 1 505 64 16 78
>      > // http://www.cert.at
>      > // Eine Initiative der nic.at GmbH
>      > // http://www.nic.at/ - Firmenbuchnummer 172568b, LG Salzburg
>      > _______________________________________________
>      > Intelmq-dev mailing list
>      > Intelmq-dev at lists.cert.at
>      > http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev
>      > _______________________________________________
>      > Intelmq-dev mailing list
>      > Intelmq-dev at lists.cert.at
>      > http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev


More information about the Intelmq-dev mailing list