IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732) - IntelMQ-dev

List overview All Threads
Download

newer

IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732)

older

heads up: new syntax for the...

Does anyone use the ftp(s)...

Tomás Lima

30 Dec 2016 30 Dec '16

1:57 a.m.

Folks,

In the current DHO there are 3 fields related to malware hash (' *malware.hash*', '*malware.hash.md5*' and '*malware.hash.sha1*') but one of them ('*malware.hash*') is not compliant with the current internal message structure (technical details can be found on the issue 732 https://github.com/certtools/intelmq/issues/732#issuecomment-269602721).

Since it's a bug that needs to be fixed and affects the DHO, I would like to propose the only three approaches that I see (maybe there are more...) to solve this issue and would like to have your feedback to achieve an agreement.

*Approaches**:*

1. Rename the key 'malware.hash' to something like 'malware.hash.other' for situations where we see a feed providing a different type of hash 2. Remove the key 'malware.hash' and keep with the other two ones 3. Remove the keys 'malware.hash.md5' and 'malware.hash.sha1' and only use the key 'malware.hash' for all types of hash. With this approach, if the feed provides a md5 and sha1 hashes in the same event, we will not be able to store both.

The chosen approach is the first one. If you have chance, please take some minutes to give your feedback in order to understand if everyone is comfortable with that.

Thank you in advance.

Cheers!

-- Tomás Lima* , * »-«* SYNchroACK *»-«

Attachments:

attachment.html (text/html — 1.7 KB)

Show replies by date

Dustin Demuth

2 Jan 2 Jan

1:05 p.m.

New subject: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732)

Dear all,

happy new year!

Tomás, thanks for your E-Mail.

...

*Approaches**:*

Rename the key 'malware.hash' to something like 'malware.hash.other' for

situations where we see a feed providing a different type of hash 2. Remove the key 'malware.hash' and keep with the other two ones 3. Remove the keys 'malware.hash.md5' and 'malware.hash.sha1' and only use the key 'malware.hash' for all types of hash. With this approach, if the feed provides a md5 and sha1 hashes in the same event, we will not be able to store both.

The chosen approach is the first one. If you have chance, please take some minutes to give your feedback in order to understand if everyone is comfortable with that.

I also prefer the first approach. Does anyone see a necessity or possibility how a "type annotation" could be added?

For instance as a "rule": "When writing to the 'malware.hash.other' field, the type of the hash must be written first, followed by one space and the hash"

Example: malware.hash.other = "SHA256 79e18f00a39f45ca2b87c9d2f27efaa08ef68701d01b2729450900a4651f81b9"

Best Regards Dustin

-- dustin.demuth@intevation.de https://intevation.de/ OpenPGP key: B40D2EFF Intevation GmbH, Neuer Graben 17, 49074 Osnabrück; AG Osnabrück, HR B 18998 Geschäftsführer: Frank Koormann, Bernhard Reiter, Dr. Jan-Oliver Wagner

Thomas Hungenberg

1:54 p.m.

New subject: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732)

Hi all,

SHA256 hashes are quite common today, so I'd suggest adding malware.hash.sha256 to the DHO in addition to md5 and sha1.

For other hashes that might be used, I like Dustins suggestion for the malware.hash.other rule.

- Thomas

CERT-Bund Incident Response & Malware Analysis Team

On 02.01.2017 13:05, Dustin Demuth wrote:

...

Dear all,

happy new year!

Tomás, thanks for your E-Mail.

...
*Approaches**:*

Rename the key 'malware.hash' to something like 'malware.hash.other' for

situations where we see a feed providing a different type of hash 2. Remove the key 'malware.hash' and keep with the other two ones 3. Remove the keys 'malware.hash.md5' and 'malware.hash.sha1' and only use the key 'malware.hash' for all types of hash. With this approach, if the feed provides a md5 and sha1 hashes in the same event, we will not be able to store both.

The chosen approach is the first one. If you have chance, please take some minutes to give your feedback in order to understand if everyone is comfortable with that.

I also prefer the first approach. Does anyone see a necessity or possibility how a "type annotation" could be added?

For instance as a "rule": "When writing to the 'malware.hash.other' field, the type of the hash must be written first, followed by one space and the hash"

Example: malware.hash.other = "SHA256 79e18f00a39f45ca2b87c9d2f27efaa08ef68701d01b2729450900a4651f81b9"

Best Regards Dustin

Intelmq-dev mailing list Intelmq-dev@lists.cert.at http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

Pavel Kácha

2:43 p.m.

New subject: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732)

Hello,

my few cents - in Idea we adopted URN syntax (as hash is basically content based resource identifier, so the hash name can denote the namespace). Which happens to be the same, just with the colon separator:

sha256:79e18f...

Cheers -- Pavel Kácha

...

From: Thomas Hungenberg th@cert-bund.de, Date: Jan 02, 2017

Hi all,

SHA256 hashes are quite common today, so I'd suggest adding malware.hash.sha256 to the DHO in addition to md5 and sha1.

For other hashes that might be used, I like Dustins suggestion for the malware.hash.other rule.
 - Thomas
CERT-Bund Incident Response & Malware Analysis Team

On 02.01.2017 13:05, Dustin Demuth wrote:

...
Dear all,

happy new year!

Tomás, thanks for your E-Mail.

...
*Approaches**:*

Rename the key 'malware.hash' to something like 'malware.hash.other' for

situations where we see a feed providing a different type of hash 2. Remove the key 'malware.hash' and keep with the other two ones 3. Remove the keys 'malware.hash.md5' and 'malware.hash.sha1' and only use the key 'malware.hash' for all types of hash. With this approach, if the feed provides a md5 and sha1 hashes in the same event, we will not be able to store both.

The chosen approach is the first one. If you have chance, please take some minutes to give your feedback in order to understand if everyone is comfortable with that.

I also prefer the first approach. Does anyone see a necessity or possibility how a "type annotation" could be added?

For instance as a "rule": "When writing to the 'malware.hash.other' field, the type of the hash must be written first, followed by one space and the hash"

Example: malware.hash.other = "SHA256 79e18f00a39f45ca2b87c9d2f27efaa08ef68701d01b2729450900a4651f81b9"

Best Regards Dustin

Intelmq-dev mailing list Intelmq-dev@lists.cert.at http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

Intelmq-dev mailing list Intelmq-dev@lists.cert.at http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

Dustin Demuth

5 Jan 5 Jan

10:30 a.m.

New subject: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732)

Hi,

Am Montag 02 Januar 2017 14:43:56 schrieb Pavel Kácha:

...

my few cents - in Idea we adopted URN syntax (as hash is basically content based resource identifier, so the hash name can denote the namespace). Which happens to be the same, just with the colon separator:

sha256:79e18f...

IMHO this syntax is a good idea. Thank you Pavel.

Tomás: Do you need more input?

Ideas so far:

* An additional field for sha256 * A convention to store the hash in ".other" like "sha256:79e18..."

BR Dustin

Tomás Lima

7:37 p.m.

New subject: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732)

Dustin, yes, the syntax looks good but how you can apply it to intelmq DHO or you're saying to use it in '*malware.hash.other*' key?

...

From my point of view we should go for:

*- malware.hash.md5*' - '*malware.hash.sha1*' - '*malware.hash.sha256*' - '*malware.hash.other*' -> using URN syntax

Make sense?

On Thu, Jan 5, 2017 at 9:30 AM, Dustin Demuth dustin.demuth@intevation.de wrote:

...

Hi,

Am Montag 02 Januar 2017 14:43:56 schrieb Pavel Kácha:

...
my few cents - in Idea we adopted URN syntax (as hash is basically content based resource identifier, so the hash name can denote the namespace). Which happens to be the same, just with the colon separator:

sha256:79e18f...

IMHO this syntax is a good idea. Thank you Pavel.

Tomás: Do you need more input?

Ideas so far:

An additional field for sha256

A convention to store the hash in ".other" like "sha256:79e18..."

BR Dustin

-- dustin.demuth@intevation.de https://intevation.de/ OpenPGP key: B40D2EFF Intevation GmbH, Neuer Graben 17, 49074 Osnabrück; AG Osnabrück, HR B 18998 Geschäftsführer: Frank Koormann, Bernhard Reiter, Dr. Jan-Oliver Wagner

Intelmq-dev mailing list Intelmq-dev@lists.cert.at http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

-- Tomás Lima* , * »-«* SYNchroACK *»-«

Pavel Kácha

6 Jan 6 Jan

9:47 a.m.

New subject: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732)

Hi,

again, just speaking based on our experience - in a year or two there will be another set of popular hashes, and you will probably start considering adding another explicit keys (malware.hash.newone) - requiring changing the harmonization in the process. We have also found out that types hashes of hashes, which are not in standard format, but have their own intrinsic unextractable properties, appear over the time. This could validate adding its own "name", for example bittorrent BTIH hash. We also thought that hash type is part of information, and thus should be part of data field, not key name. So, we have just used one key, using solely URN namespace for adding new hash types.

(It is also necessary to say that one contents can be identified by more hashes, so you may find out over time that just single scalar field may not be enough. But I digress here. :) )

Cheers -- Pavel

...

From: Tomás Lima synchroack@gmail.com, Date: Jan 05, 2017

Dustin, yes, the syntax looks good but how you can apply it to intelmq DHO or you're saying to use it in 'malware.hash.other' key? From my point of view we should go for:

malware.hash.md5'

'malware.hash.sha1'

- 'malware.hash.sha256' - 'malware.hash.other' -> using URN syntax Make sense? On Thu, Jan 5, 2017 at 9:30 AM, Dustin Demuth <[1]dustin.demuth@intevation.de> wrote:
 Hi,
 Am Montag 02 Januar 2017 14:43:56 schrieb Pavel Kácha:

 >    my few cents - in Idea we adopted URN syntax (as hash is basically
 > content based resource identifier, so the hash name can denote the
 > namespace).  Which happens to be the same, just with the colon
 separator:
 >
 >    sha256:79e18f...
 >

 IMHO this syntax is a good idea. Thank you Pavel.

 Tomás: Do you need more input?

 Ideas so far:

 * An additional field for sha256
 * A convention to store the hash in ".other" like "sha256:79e18..."

 BR
 Dustin

 --
 [2]dustin.demuth@intevation.de  [3]https://intevation.de/   OpenPGP
 key: B40D2EFF
 Intevation GmbH, Neuer Graben 17, 49074 Osnabrück; AG Osnabrück, HR B
 18998
 Geschäftsführer:   Frank Koormann,  Bernhard Reiter,  Dr. Jan-Oliver
 Wagner
 _______________________________________________
 Intelmq-dev mailing list
 [4]Intelmq-dev@lists.cert.at
 [5]http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev
-- Tomás Lima , »-« SYNchroACK »-«

References

Visible links

mailto:dustin.demuth@intevation.de

mailto:dustin.demuth@intevation.de

https://intevation.de/

mailto:Intelmq-dev@lists.cert.at

http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

...

Intelmq-dev mailing list Intelmq-dev@lists.cert.at http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

Sebastian Wagner

11 Jan 11 Jan

10:36 a.m.

New subject: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732)

I also think that adding one field per hash type is not feasible as there are a lot of hash types and they change over time. That's why we used malware.hash and the Crypt (C) names. I wasn't aware of URN at this time and it is definitely better - easier to understand and supports more hash types. Consequently malware.hash needs to be a list (could be made comma separated for postgres?).

Sebastian

On 01/06/2017 09:47 AM, Pavel Kácha wrote:

...

Hi,

again, just speaking based on our experience - in a year or two there will be another set of popular hashes, and you will probably start considering adding another explicit keys (malware.hash.newone) - requiring changing the harmonization in the process. We have also found out that types hashes of hashes, which are not in standard format, but have their own intrinsic unextractable properties, appear over the time. This could validate adding its own "name", for example bittorrent BTIH hash. We also thought that hash type is part of information, and thus should be part of data field, not key name. So, we have just used one key, using solely URN namespace for adding new hash types.

(It is also necessary to say that one contents can be identified by more hashes, so you may find out over time that just single scalar field may not be enough. But I digress here. :) )

Cheers -- Pavel

-- // Sebastian Wagner wagner@cert.at - T: +43 1 50564167201 // CERT Austria - http://www.cert.at/ // Eine Initiative der nic.at GmbH - http://www.nic.at/ // Firmenbuchnummer 172568b, LG Salzburg

L. Aaron Kaplan

17 Jan 17 Jan

12:08 p.m.

New subject: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732)

...

On 11 Jan 2017, at 10:36, Sebastian Wagner wagner@cert.at wrote:

I also think that adding one field per hash type is not feasible as there are a lot of hash types and they change over time. That's why we used malware.hash and the Crypt (C) names. I wasn't aware of URN at this time and it is definitely better - easier to understand and supports more hash types. Consequently malware.hash needs to be a list (could be made comma separated for postgres?).

agreed

...

Sebastian

On 01/06/2017 09:47 AM, Pavel Kácha wrote:

...
Hi,

again, just speaking based on our experience - in a year or two there will be another set of popular hashes, and you will probably start considering adding another explicit keys (malware.hash.newone) - requiring changing the harmonization in the process. We have also found out that types hashes of hashes, which are not in standard format, but have their own intrinsic unextractable properties, appear over the time. This could validate adding its own "name", for example bittorrent BTIH hash. We also thought that hash type is part of information, and thus should be part of data field, not key name. So, we have just used one key, using solely URN namespace for adding new hash types.

(It is also necessary to say that one contents can be identified by more hashes, so you may find out over time that just single scalar field may not be enough. But I digress here. :) )

Cheers -- Pavel

-- // Sebastian Wagner wagner@cert.at - T: +43 1 50564167201 // CERT Austria - http://www.cert.at/ // Eine Initiative der nic.at GmbH - http://www.nic.at/ // Firmenbuchnummer 172568b, LG Salzburg

Intelmq-dev mailing list Intelmq-dev@lists.cert.at http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

-- // CERT Austria // L. Aaron Kaplan kaplan@cert.at // T: +43 1 505 64 16 78 // http://www.cert.at // Eine Initiative der nic.at GmbH // http://www.nic.at/ - Firmenbuchnummer 172568b, LG Salzburg

Tomás Lima

1 Feb 1 Feb

8:50 a.m.

New subject: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732)

Pavel, can you confirm in which "message" format are you currently using URN? Is it like the following?

{ "malware.hash": [ "urn:hash::md5:5307d294b6ccd9854f2deed8c1628b72", "urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL" ] }

The current issue on IntelMQ is the fact that we cannot add list as values but if it was that case, the proposal would suit perfectly.

Aaron and Sebastian, I see one issue:

If I have a bot like Virustotal which use malware.hash field to query the API, how should I create the bot since the hashes are in one field comma separated as you guys mentioned...? IMHO we should add one field per each hash type (it does not change so quickly) and before the next hash type, I expect that we as community have a final answer about Full-JSON support or "single value" formats like we are currently supporting.

I think it's time to decide regarding the malware keys. How do you guys want to proceed in the end?

-------- Original Message -------- Subject: Re: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732) Local Time: January 17, 2017 11:08 AM UTC Time: January 17, 2017 11:08 AM From: kaplan@cert.at To: Sebastian Wagner wagner@cert.at intelmq-dev@lists.cert.at

...

On 11 Jan 2017, at 10:36, Sebastian Wagner wagner@cert.at wrote:

I also think that adding one field per hash type is not feasible as there are a lot of hash types and they change over time. That's why we used malware.hash and the Crypt (C) names. I wasn't aware of URN at this time and it is definitely better - easier to understand and supports more hash types. Consequently malware.hash needs to be a list (could be made comma separated for postgres?).

agreed

...

Sebastian

On 01/06/2017 09:47 AM, Pavel Kácha wrote:

...
Hi,

again, just speaking based on our experience - in a year or two there will be another set of popular hashes, and you will probably start considering adding another explicit keys (malware.hash.newone) - requiring changing the harmonization in the process. We have also found out that types hashes of hashes, which are not in standard format, but have their own intrinsic unextractable properties, appear over the time. This could validate adding its own "name", for example bittorrent BTIH hash. We also thought that hash type is part of information, and thus should be part of data field, not key name. So, we have just used one key, using solely URN namespace for adding new hash types.

(It is also necessary to say that one contents can be identified by more hashes, so you may find out over time that just single scalar field may not be enough. But I digress here. :) )

Cheers -- Pavel

-- // Sebastian Wagner wagner@cert.at - T: +43 1 50564167201 // CERT Austria - http://www.cert.at/ // Eine Initiative der nic.at GmbH - http://www.nic.at/ // Firmenbuchnummer 172568b, LG Salzburg

Intelmq-dev mailing list Intelmq-dev@lists.cert.at http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

-- // CERT Austria // L. Aaron Kaplan kaplan@cert.at // T: +43 1 505 64 16 78 // http://www.cert.at // Eine Initiative der nic.at GmbH // http://www.nic.at/ - Firmenbuchnummer 172568b, LG Salzburg

_______________________________________________ Intelmq-dev mailing list Intelmq-dev@lists.cert.at http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

Tomás Lima

9:04 a.m.

New subject: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732)

Please check also this old issue: https://github.com/certtools/intelmq/issues/394

-------- Original Message -------- Subject: Re: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732) Local Time: February 1, 2017 7:50 AM UTC Time: February 1, 2017 7:50 AM From: synchroack@protonmail.ch To: L. Aaron Kaplan kaplan@cert.at Sebastian Wagner wagner@cert.at, intelmq-dev@lists.cert.at

Pavel, can you confirm in which "message" format are you currently using URN? Is it like the following?

{ "malware.hash": [ "urn:hash::md5:5307d294b6ccd9854f2deed8c1628b72", "urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL" ] }

The current issue on IntelMQ is the fact that we cannot add list as values but if it was that case, the proposal would suit perfectly.

Aaron and Sebastian, I see one issue:

I think it's time to decide regarding the malware keys. How do you guys want to proceed in the end?

...

On 11 Jan 2017, at 10:36, Sebastian Wagner wagner@cert.at wrote:

I also think that adding one field per hash type is not feasible as there are a lot of hash types and they change over time. That's why we used malware.hash and the Crypt (C) names. I wasn't aware of URN at this time and it is definitely better - easier to understand and supports more hash types. Consequently malware.hash needs to be a list (could be made comma separated for postgres?).

agreed

...

Sebastian

On 01/06/2017 09:47 AM, Pavel Kácha wrote:

...
Hi,

again, just speaking based on our experience - in a year or two there will be another set of popular hashes, and you will probably start considering adding another explicit keys (malware.hash.newone) - requiring changing the harmonization in the process. We have also found out that types hashes of hashes, which are not in standard format, but have their own intrinsic unextractable properties, appear over the time. This could validate adding its own "name", for example bittorrent BTIH hash. We also thought that hash type is part of information, and thus should be part of data field, not key name. So, we have just used one key, using solely URN namespace for adding new hash types.

(It is also necessary to say that one contents can be identified by more hashes, so you may find out over time that just single scalar field may not be enough. But I digress here. :) )

Cheers -- Pavel

-- // Sebastian Wagner wagner@cert.at - T: +43 1 50564167201 // CERT Austria - http://www.cert.at/ // Eine Initiative der nic.at GmbH - http://www.nic.at/ // Firmenbuchnummer 172568b, LG Salzburg

Intelmq-dev mailing list Intelmq-dev@lists.cert.at http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

-- // CERT Austria // L. Aaron Kaplan kaplan@cert.at // T: +43 1 505 64 16 78 // http://www.cert.at // Eine Initiative der nic.at GmbH // http://www.nic.at/ - Firmenbuchnummer 172568b, LG Salzburg

_______________________________________________ Intelmq-dev mailing list Intelmq-dev@lists.cert.at http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

Pavel Kácha

11:13 a.m.

New subject: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732)

Hi,

hmm, looking into uri/url/urn terms, I have used the wrong one, sorry for that. Because (from the name of the key in Idea) we know it is a hash, we do not use "urn:" (or "urn:hash") prefix to keep things simple, so in reality it is URI. So we use hash name directly as a URI scheme. As a scheme we use well known hash names (md5, sha1), or reasonably descriptive name of unusual hashes (btih for BitTorrent Info Hash, which can be encoded in various ways), or just "hash:" as a fallback, where we do not know the internal format of the hash (but we try not to overuse that and rather find out some more unique/descriptive identifier). Example:

Hash: [ "md5:5307d294b6ccd9854f2deed8c1628b72", "sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL", "btih:QHQXPYWMACKDWKP47RRVIV7VOURXFE5Q", "passwd:$1$Jrbw4gbM$Er6MejOKAXqT.VTII8vGV%2F" ]

So, to be precise, we do not try to overstandardize (various protocols we might encounter in the future may use various convoluted types of hashes), and in reality it is not even a real URI, because we do not register the prefixes at IANA. But for our intents and purposes it looks like a URI, quacks like a URI, so we can keep things simple and work with that as with a URI. :)

Regarding lists/arrays - you have mentioned https://github.com/certtools/intelmq/issues/394

so if I understand well, you are considering multivalues for 1.1. From that point of view, I would consider leaving the hash as single value for 1.0 - as multiple identifiers for one content happen, but rarely; and you are already keeping things simple with single source ip/single destination ip, for example (of course only in case there is not somebody who _really_ needs the multivalue hash field). Then in 1.1 you can work out list/multivalue issues and incorporate all that in one go.

-- Pavel

P.S.: Maybe there are people here who don't know what I'm talking about when referencing Idea, so here's what we are using: https://idea.cesnet.cz/en/index

...

From: Tomás Lima synchroack@protonmail.ch, Date: Feb 01, 2017

Pavel, can you confirm in which "message" format are you currently using URN? Is it like the following? { "malware.hash": [ "urn:hash::md5:5307d294b6ccd9854f2deed8c1628b72", "urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL" ] } The current issue on IntelMQ is the fact that we cannot add list as values but if it was that case, the proposal would suit perfectly. Aaron and Sebastian, I see one issue: If I have a bot like Virustotal which use malware.hash field to query the API, how should I create the bot since the hashes are in one field comma separated as you guys mentioned...? IMHO we should add one field per each hash type (it does not change so quickly) and before the next hash type, I expect that we as community have a final answer about Full-JSON support or "single value" formats like we are currently supporting. I think it's time to decide regarding the malware keys. How do you guys want to proceed in the end?

 -------- Original Message --------
 Subject: Re: [Intelmq-dev] IntelMQ Data Harmonization (DHO) -
 malware.hash key (issue 732)
 Local Time: January 17, 2017 11:08 AM
 UTC Time: January 17, 2017 11:08 AM
 From: kaplan@cert.at
 To: Sebastian Wagner <wagner@cert.at>
 intelmq-dev@lists.cert.at
 > On 11 Jan 2017, at 10:36, Sebastian Wagner <wagner@cert.at> wrote:
 >
 > I also think that adding one field per hash type is not feasible as
 > there are a lot of hash types and they change over time. That's why
 we
 > used malware.hash and the Crypt (C) names.
 > I wasn't aware of URN at this time and it is definitely better -
 easier
 > to understand and supports more hash types. Consequently malware.hash
 > needs to be a list (could be made comma separated for postgres?).
 >
 agreed
 > Sebastian
 >
 >
 > On 01/06/2017 09:47 AM, Pavel Kácha wrote:
 >> Hi,
 >>
 >> again, just speaking based on our experience - in a year or two
 there
 >> will be another set of popular hashes, and you will probably start
 >> considering adding another explicit keys (malware.hash.newone) -
 requiring
 >> changing the harmonization in the process.
 >> We have also found out that types hashes of hashes, which are not in
 >> standard format, but have their own intrinsic unextractable
 properties,
 >> appear over the time. This could validate adding its own "name", for
 >> example bittorrent BTIH hash.
 >> We also thought that hash type is part of information, and thus
 should be
 >> part of data field, not key name.
 >> So, we have just used one key, using solely URN namespace for adding
 new
 >> hash types.
 >>
 >> (It is also necessary to say that one contents can be identified by
 more
 >> hashes, so you may find out over time that just single scalar field
 may not
 >> be enough. But I digress here. :) )
 >>
 >> Cheers
 >> -- Pavel
 >
 > --
 > // Sebastian Wagner <wagner@cert.at> - T: +43 1 50564167201
 > // CERT Austria - http://www.cert.at/
 > // Eine Initiative der nic.at GmbH - http://www.nic.at/
 > // Firmenbuchnummer 172568b, LG Salzburg
 >
 >
 > _______________________________________________
 > Intelmq-dev mailing list
 > Intelmq-dev@lists.cert.at
 > http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev
 --
 // CERT Austria
 // L. Aaron Kaplan <kaplan@cert.at>
 // T: +43 1 505 64 16 78
 // http://www.cert.at
 // Eine Initiative der nic.at GmbH
 // http://www.nic.at/ - Firmenbuchnummer 172568b, LG Salzburg
 _______________________________________________
 Intelmq-dev mailing list
 Intelmq-dev@lists.cert.at
 http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

...

Intelmq-dev mailing list Intelmq-dev@lists.cert.at http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

Tomás Lima

11:38 a.m.

New subject: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732)

Thank you Pavel for the excellent feedback.

Well, I really want to have option to specify sha1, sha256 and md5 in same message since I'm planning to use results from sources like VirusTotal which will be useful to correlate information like Aaron mention on that issue:

"assume you are given a hash (sha1) of a piece of malware and you want to find it in the events table. However, you only stored the md5 since that is what you received even though the sender sent you both fields (sha1 and md4 - such as the n6 feed). Then you can not ever find the right entry again."

I vote for: { ... "malware.hash.md5": "<md5 hash>", "malware.hash.sha1": "<sha1 hash>", "malware.hash.sha256": "<sha256 hash>" ... }

instead of: { ... "malware.hash": "md5:<md5 hash>,sha1:<sha1 hash>,sha256:<sha256 hash>" ... }

For me, we can evaluate this issue again when we start discuss Full-JSON vs "single value" format on mileston v1.1, but for now, I would go on the way that I showed.

What do you guys say?

-------- Original Message -------- Subject: Re: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732) Local Time: February 1, 2017 10:13 AM UTC Time: February 1, 2017 10:13 AM From: ph@cesnet.cz To: Tomás Lima synchroack@protonmail.ch L. Aaron Kaplan kaplan@cert.at, intelmq-dev@lists.cert.at

Hi,

Hash: [ "md5:5307d294b6ccd9854f2deed8c1628b72", "sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL", "btih:QHQXPYWMACKDWKP47RRVIV7VOURXFE5Q", "passwd:$1$Jrbw4gbM$Er6MejOKAXqT.VTII8vGV%2F" ]

Regarding lists/arrays - you have mentioned https://github.com/certtools/intelmq/issues/394

-- Pavel

P.S.: Maybe there are people here who don't know what I'm talking about when referencing Idea, so here's what we are using: https://idea.cesnet.cz/en/index

...

From: Tomás Lima synchroack@protonmail.ch, Date: Feb 01, 2017

Pavel, can you confirm in which "message" format are you currently using URN? Is it like the following? { "malware.hash": [ "urn:hash::md5:5307d294b6ccd9854f2deed8c1628b72", "urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL" ] } The current issue on IntelMQ is the fact that we cannot add list as values but if it was that case, the proposal would suit perfectly. Aaron and Sebastian, I see one issue: If I have a bot like Virustotal which use malware.hash field to query the API, how should I create the bot since the hashes are in one field comma separated as you guys mentioned...? IMHO we should add one field per each hash type (it does not change so quickly) and before the next hash type, I expect that we as community have a final answer about Full-JSON support or "single value" formats like we are currently supporting. I think it's time to decide regarding the malware keys. How do you guys want to proceed in the end?

-------- Original Message -------- Subject: Re: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732) Local Time: January 17, 2017 11:08 AM UTC Time: January 17, 2017 11:08 AM From: kaplan@cert.at To: Sebastian Wagner wagner@cert.at intelmq-dev@lists.cert.at

...
On 11 Jan 2017, at 10:36, Sebastian Wagner wagner@cert.at wrote:

I also think that adding one field per hash type is not feasible as there are a lot of hash types and they change over time. That's why

we

...
used malware.hash and the Crypt (C) names. I wasn't aware of URN at this time and it is definitely better -

easier

...
to understand and supports more hash types. Consequently malware.hash needs to be a list (could be made comma separated for postgres?).

agreed

...
Sebastian

On 01/06/2017 09:47 AM, Pavel Kácha wrote:

...
Hi,

again, just speaking based on our experience - in a year or two

there

...
...
will be another set of popular hashes, and you will probably start considering adding another explicit keys (malware.hash.newone) -

requiring

...
...
changing the harmonization in the process. We have also found out that types hashes of hashes, which are not in standard format, but have their own intrinsic unextractable

properties,

...
...
appear over the time. This could validate adding its own "name", for example bittorrent BTIH hash. We also thought that hash type is part of information, and thus

should be

...
...
part of data field, not key name. So, we have just used one key, using solely URN namespace for adding

new

...
...
hash types.

(It is also necessary to say that one contents can be identified by

more

...
...
hashes, so you may find out over time that just single scalar field

may not

...
...
be enough. But I digress here. :) )

Cheers -- Pavel

-- // Sebastian Wagner wagner@cert.at - T: +43 1 50564167201 // CERT Austria - http://www.cert.at/ // Eine Initiative der nic.at GmbH - http://www.nic.at/ // Firmenbuchnummer 172568b, LG Salzburg

Intelmq-dev mailing list Intelmq-dev@lists.cert.at http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

-- // CERT Austria // L. Aaron Kaplan kaplan@cert.at // T: +43 1 505 64 16 78 // http://www.cert.at // Eine Initiative der nic.at GmbH // http://www.nic.at/ - Firmenbuchnummer 172568b, LG Salzburg _______________________________________________ Intelmq-dev mailing list Intelmq-dev@lists.cert.at http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

...

Intelmq-dev mailing list Intelmq-dev@lists.cert.at http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

Otmar Lendl

6 p.m.

New subject: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732)

On 01.02.2017 11:38, Tomás Lima wrote:

...

Thank you Pavel for the excellent feedback.

Well, I really want to have option to specify sha1, sha256 and md5 in same message since I'm planning to use results from sources like VirusTotal which will be useful to correlate information like Aaron mention on that issue:

"assume you are given a hash (sha1) of a piece of malware and you want to find it in the events table. However, you only stored the md5 since that is what you received even though the sender sent you both fields (sha1 and md4 - such as the n6 feed). Then you can not ever find the right entry again."

I vote for: { ... "malware.hash.md5": "<md5 hash>", "malware.hash.sha1": "<sha1 hash>", "malware.hash.sha256": "<sha256 hash>" ... }

instead of: { ... "malware.hash": "md5:<md5 hash>,sha1:<sha1 hash>,sha256:<sha256 hash>" ... }

I agree.

IMHO it boils down to whether you just store the info or whether you will ever want to search for the info.

Right now the eventdb is a traditional relational DB without native json or multivalue support.

As long that is the case it is much better to stick to "one value in one field".

otmar

-- // Otmar Lendl lendl@cert.at - T: +43 1 5056416 711 // CERT Austria - http://www.cert.at/ // Eine Initiative der nic.at GmbH - http://www.nic.at/ // Firmenbuchnummer 172568b, LG Salzburg

Pavel Kácha

9 Feb 9 Feb

4:42 p.m.

New subject: [Intelmq-dev] IntelMQ Data Harmonization (DHO) - malware.hash key (issue 732)

Hi, as I see no replies, my 2 cents: If you definitely need more different types of hashes for one content in 1.0 timeframe, where multivalues are a hurdle, staying with explicit key for various hashes seems a rational solution to me. However, it may be wise do to a bit of digging (maybe based on current supported set of parsers and their output?) for what types of hashes are already in the wild and used, so users don't end up resorting to something generic too soon.

Cheers -- Pavel

...

From: Tomás Lima synchroack@protonmail.ch, Date: Feb 01, 2017

Thank you Pavel for the excellent feedback. Well, I really want to have option to specify sha1, sha256 and md5 in same message since I'm planning to use results from sources like VirusTotal which will be useful to correlate information like Aaron mention on that issue: "assume you are given a hash (sha1) of a piece of malware and you want to find it in the events table. However, you only stored the md5 since that is what you received even though the sender sent you both fields (sha1 and md4 - such as the n6 feed). Then you can not ever find the right entry again." I vote for: { ... "malware.hash.md5": "<md5 hash>", "malware.hash.sha1": "<sha1 hash>", "malware.hash.sha256": "<sha256 hash>" ... } instead of: { ... "malware.hash": "md5:<md5 hash>,sha1:<sha1 hash>,sha256:<sha256 hash>" ... } For me, we can evaluate this issue again when we start discuss Full-JSON vs "single value" format on mileston v1.1, but for now, I would go on the way that I showed. What do you guys say?

 -------- Original Message --------
 Subject: Re: [Intelmq-dev] IntelMQ Data Harmonization (DHO) -
 malware.hash key (issue 732)
 Local Time: February 1, 2017 10:13 AM
 UTC Time: February 1, 2017 10:13 AM
 From: ph@cesnet.cz
 To: Tomás Lima <synchroack@protonmail.ch>
 L. Aaron Kaplan <kaplan@cert.at>, intelmq-dev@lists.cert.at
 Hi,
 hmm, looking into uri/url/urn terms, I have used the wrong one, sorry
 for
 that. Because (from the name of the key in Idea) we know it is a hash,
 we
 do not use "urn:" (or "urn:hash") prefix to keep things simple, so in
 reality it is URI.
 So we use hash name directly as a URI scheme. As a scheme we use well
 known hash names (md5, sha1), or reasonably descriptive name of unusual
 hashes (btih for BitTorrent Info Hash, which can be encoded in various
 ways), or just "hash:" as a fallback, where we do not know the internal
 format of the hash (but we try not to overuse that and rather find out
 some
 more unique/descriptive identifier). Example:
 Hash: [
 "md5:5307d294b6ccd9854f2deed8c1628b72",
 "sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL",
 "btih:QHQXPYWMACKDWKP47RRVIV7VOURXFE5Q",
 "passwd:$1$Jrbw4gbM$Er6MejOKAXqT.VTII8vGV%2F"
 ]
 So, to be precise, we do not try to overstandardize (various protocols
 we
 might encounter in the future may use various convoluted types of
 hashes),
 and in reality it is not even a real URI, because we do not register
 the
 prefixes at IANA.
 But for our intents and purposes it looks like a URI, quacks like a
 URI,
 so we can keep things simple and work with that as with a URI. :)
 Regarding lists/arrays - you have mentioned
 https://github.com/certtools/intelmq/issues/394
 so if I understand well, you are considering multivalues for 1.1. From
 that point of view, I would consider leaving the hash as single value
 for
 1.0 - as multiple identifiers for one content happen, but rarely; and
 you
 are already keeping things simple with single source ip/single
 destination
 ip, for example (of course only in case there is not somebody who
 _really_
 needs the multivalue hash field).
 Then in 1.1 you can work out list/multivalue issues and incorporate all
 that in one go.
 -- Pavel
 P.S.: Maybe there are people here who don't know what I'm talking about
 when referencing Idea, so here's what we are using:
 https://idea.cesnet.cz/en/index
 > From: Tomás Lima <synchroack@protonmail.ch>, Date: Feb 01, 2017
 >
 > Pavel, can you confirm in which "message" format are you currently
 using
 > URN? Is it like the following?
 > {
 >     "malware.hash": [
 >         "urn:hash::md5:5307d294b6ccd9854f2deed8c1628b72",
 >         "urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL"
 >     ]
 > }
 > The current issue on IntelMQ is the fact that we cannot add list as
 > values but if it was that case, the proposal would suit perfectly.
 > Aaron and Sebastian, I see one issue:
 > If I have a bot like Virustotal which use malware.hash field to query
 the
 > API, how should I create the bot since the hashes are in one field
 comma
 > separated as you guys mentioned...? IMHO we should add one field per
 each
 > hash type (it does not change so quickly) and before the next hash
 type,
 > I expect that we as community have a final answer about Full-JSON
 support
 > or "single value" formats like we are currently supporting.
 > I think it's time to decide regarding the malware keys. How do you
 guys
 > want to proceed in the end?
 >
 > -------- Original Message --------
 > Subject: Re: [Intelmq-dev] IntelMQ Data Harmonization (DHO) -
 > malware.hash key (issue 732)
 > Local Time: January 17, 2017 11:08 AM
 > UTC Time: January 17, 2017 11:08 AM
 > From: kaplan@cert.at
 > To: Sebastian Wagner <wagner@cert.at>
 > intelmq-dev@lists.cert.at
 > > On 11 Jan 2017, at 10:36, Sebastian Wagner <wagner@cert.at> wrote:
 > >
 > > I also think that adding one field per hash type is not feasible as
 > > there are a lot of hash types and they change over time. That's why
 > we
 > > used malware.hash and the Crypt (C) names.
 > > I wasn't aware of URN at this time and it is definitely better -
 > easier
 > > to understand and supports more hash types. Consequently
 malware.hash
 > > needs to be a list (could be made comma separated for postgres?).
 > >
 > agreed
 > > Sebastian
 > >
 > >
 > > On 01/06/2017 09:47 AM, Pavel Kácha wrote:
 > >> Hi,
 > >>
 > >> again, just speaking based on our experience - in a year or two
 > there
 > >> will be another set of popular hashes, and you will probably start
 > >> considering adding another explicit keys (malware.hash.newone) -
 > requiring
 > >> changing the harmonization in the process.
 > >> We have also found out that types hashes of hashes, which are not
 in
 > >> standard format, but have their own intrinsic unextractable
 > properties,
 > >> appear over the time. This could validate adding its own "name",
 for
 > >> example bittorrent BTIH hash.
 > >> We also thought that hash type is part of information, and thus
 > should be
 > >> part of data field, not key name.
 > >> So, we have just used one key, using solely URN namespace for
 adding
 > new
 > >> hash types.
 > >>
 > >> (It is also necessary to say that one contents can be identified
 by
 > more
 > >> hashes, so you may find out over time that just single scalar
 field
 > may not
 > >> be enough. But I digress here. :) )
 > >>
 > >> Cheers
 > >> -- Pavel
 > >
 > > --
 > > // Sebastian Wagner <wagner@cert.at> - T: +43 1 50564167201
 > > // CERT Austria - http://www.cert.at/
 > > // Eine Initiative der nic.at GmbH - http://www.nic.at/
 > > // Firmenbuchnummer 172568b, LG Salzburg
 > >
 > >
 > > _______________________________________________
 > > Intelmq-dev mailing list
 > > Intelmq-dev@lists.cert.at
 > > http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev
 > --
 > // CERT Austria
 > // L. Aaron Kaplan <kaplan@cert.at>
 > // T: +43 1 505 64 16 78
 > // http://www.cert.at
 > // Eine Initiative der nic.at GmbH
 > // http://www.nic.at/ - Firmenbuchnummer 172568b, LG Salzburg
 > _______________________________________________
 > Intelmq-dev mailing list
 > Intelmq-dev@lists.cert.at
 > http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev
 > _______________________________________________
 > Intelmq-dev mailing list
 > Intelmq-dev@lists.cert.at
 > http://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev

3102

Age (days ago)

3143

Last active (days ago)

intelmq-dev@lists.cert.at

14 comments

8 participants

tags (0)

participants (8)

Dustin Demuth
L. Aaron Kaplan
Otmar Lendl
Pavel Kácha
Sebastian Wagner
Thomas Hungenberg
Tomás Lima
Tomás Lima