[Intelmq-dev] Proposal (Request For Comments) - IntelMQ with Run Modes & Process Management

Fri Apr 28 15:27:25 CEST 2017

On 04/26/2017 05:24 PM, Otmar Lendl wrote:
> Ad b)
> IMHO much more tricky is the issue of actually processing huge
> data-sets. Once you reach file-sizes in the GB range one needs to switch
> from "load everything into a data-structure in RAM, then process it" to
> a "load next few KB from a data-stream, process it, then get next slice".
>
> My worry is that the current bot API cannot be easily converted to
> stream processing.
>
> We need to think this through.
The ParserBot[1] uses generators (i.e. processing one line after
another) except for one detail: Base64 decoding of `raw` - IMHO we
should get rid of that anyway, it just blows up the size. Redis can
handle the data without base64 just fine.

All parsers derived from ParserBot only overwrite single methods, they
all work in the same way.
But note that not all Parsers we have are converted to the ParserBot
class yet, but that's nothing spectacular.

Sebastian

[1]:
https://github.com/certtools/intelmq/blob/1.0.0.dev6/intelmq/lib/bot.py#L453

-- 
// Sebastian Wagner <wagner at cert.at> - T: +43 1 5056416 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.cert.at/pipermail/intelmq-dev/attachments/20170428/8da75beb/attachment.sig>