[Intelmq-dev] Proposal (Request For Comments) - IntelMQ with Run Modes & Process Management

Bernhard Reiter bernhard at intevation.de
Fri Apr 28 09:54:58 CEST 2017


Hi,

Am Mittwoch 26 April 2017 17:24:03 schrieb Otmar Lendl:
> b) how to process larger data-sets

> IMHO much more tricky is the issue of actually processing huge
> data-sets. Once you reach file-sizes in the GB range one needs to switch
> from "load everything into a data-structure in RAM, then process it" to
> a "load next few KB from a data-stream, process it, then get next slice".

note that there is code to split up line-based data, such as CSV, see
https://github.com/certtools/intelmq/pull/680

> > c. events should have IDs. This will help in acknowledging the correct
> > message in case of multi processing wrt to b.
>
> Yes, but for a different reason: Assume more CERTs that do
> IntelMQ-IntelMQ cross-connects. You need a way to avoid building
> forwarding-loops. Persistent IDs can help (analogue to Message-IDs in
> the Usenet context).

The problem I see with this approach is that we do not have one
origin of the information that could create a unique id for it.
Let us say, two observing systems notice the same "abuse event"
on a machine somewhere and start processing it, they might start two different 
ids for the same event. Just checking the id later for duplicate would not 
help.

Or let us say a single event gets an UID runs through two systems
with processing it slight differently and then end up in one abuse system via 
two different sources. It is the same UID then, but different data details. 
Just rejecting the second incoming report on bases of the UID would throw 
additional info away and does not seem to be enough.

This is why I still think that one system should have a (working code) 
definition when it consideres two events being equal and then applying it to 
each report for deduplication and prevention of forward-loops.

> (btw the Usenet analogy: some sort of Path: header would also be
> helpful: a list of Systems that this event has already passed through.)

Email and news headers cannot be trusted much, what would be gain from the 
info?

Best,
Bernhard

-- 
www.intevation.de/~bernhard   +49 541 33 508 3-3
Intevation GmbH, Osnabrück, DE; Amtsgericht Osnabrück, HRB 18998
Geschäftsführer Frank Koormann, Bernhard Reiter, Dr. Jan-Oliver Wagner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.cert.at/pipermail/intelmq-dev/attachments/20170428/dfad1bd4/attachment.sig>


More information about the Intelmq-dev mailing list