Re: [Intelmq-dev] Proposal (Request For Comments) - IntelMQ with Run Modes & Process Management

28 Apr 2017


      On 04/26/2017 05:24 PM, Otmar Lendl wrote:
...
Ad b)
IMHO much more tricky is the issue of actually processing huge
data-sets. Once you reach file-sizes in the GB range one needs to switch
from "load everything into a data-structure in RAM, then process it" to
a "load next few KB from a data-stream, process it, then get next slice".
My worry is that the current bot API cannot be easily converted to
stream processing.
We need to think this through.
The ParserBot[1] uses generators (i.e. processing one line after
another) except for one detail: Base64 decoding of `raw` - IMHO we
should get rid of that anyway, it just blows up the size. Redis can
handle the data without base64 just fine.
All parsers derived from ParserBot only overwrite single methods, they
all work in the same way.
But note that not all Parsers we have are converted to the ParserBot
class yet, but that's nothing spectacular.
Sebastian
[1]:
https://github.com/certtools/intelmq/blob/1.0.0.dev6/intelmq/lib/bot.py#L453
-- 
// Sebastian Wagner wagner@cert.at - T: +43 1 5056416 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

Re: [Intelmq-dev] Proposal (Request For Comments) - IntelMQ with Run Modes & Process Management