[Intelmq-dev] Proposal (Request For Comments) - IntelMQ with Run Modes & Process Management

Otmar Lendl lendl at cert.at
Wed Apr 26 17:24:03 CEST 2017


Hi,

Some comments:

(I'm not fully up to date on IntelMQ internals, so I might be off.)

On 21.04.2017 21:00, Navtej Singh wrote:
>
> With RandomizedDelaySec systemd will spread the execution over a period
> thus preventing this sudden rush for memory. This was very helpful.

I would be wary about relying on randomization. Random numbers have the
property that every now and then they are all identical.

So I'd consider that to be more of a CPU load-distribution and not as a
fix for the RAM usage.

> Another thing, which might be worth discussion is, collectors should have a
> flag, to save the collected input to a file, and parser could then
> potentially pick from queue or file. This will help in cases where the
> input size is relative large, eg blueliv or alienvault (subscribed to lot
> of pulses, reminds me I need to submit a PR for this enhancement). May be
> some enhancements to fileinput/fileoutput bot can do that, I haven't really
> explored it, however an integrated approach would be much better, imo.

IMHO there are multiple issues:

a) how to pass huge amounts of data between bots
b) how to process larger data-sets

Ad a)

yes, passing a reference to a file (filename?) instead of the content of
the file is one option. It may well be that using a different
Message-Passing backend (e.g. Rabbit-MQ) might also solve the issue.

Ad b)

IMHO much more tricky is the issue of actually processing huge
data-sets. Once you reach file-sizes in the GB range one needs to switch
from "load everything into a data-structure in RAM, then process it" to
a "load next few KB from a data-stream, process it, then get next slice".

My worry is that the current bot API cannot be easily converted to
stream processing.

We need to think this through.

> a. Replace redis as queue with something persistent. As present redis uses
> a lot of memory since it keeps the events in memory. if your feeds are
> getting data frequently and, in the chain, you have a slow processing
> expert, queue size keeps growing and so does the redis memory usage.

Yes.

> b. multiple events processing by single bot, This has been discussed a lot
> in issues and mailing lists. I have an implementation using gevents[2].
> However there are problems with this, those trade-offs I am fine with. c &
> d might help to resolve these issues.

Yes, some experts would be a **lot** more efficient if they can do bulk
processing.

> c. events should have IDs. This will help in acknowledging the correct
> message in case of multi processing wrt to b.

Yes, but for a different reason: Assume more CERTs that do
IntelMQ-IntelMQ cross-connects. You need a way to avoid building
forwarding-loops. Persistent IDs can help (analogue to Message-IDs in
the Usenet context).

(btw the Usenet analogy: some sort of Path: header would also be
helpful: a list of Systems that this event has already passed through.)

otmar
-- 
// Otmar Lendl <lendl at cert.at> - T: +43 1 5056416 711
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - http://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.cert.at/pipermail/intelmq-dev/attachments/20170426/0c4f4da6/attachment.sig>


More information about the Intelmq-dev mailing list