[Intelmq-dev] Proposal (Request For Comments) - IntelMQ with Run Modes & Process Management
Otmar Lendl
lendl at cert.at
Wed Apr 26 17:24:03 CEST 2017
Hi,
Some comments:
(I'm not fully up to date on IntelMQ internals, so I might be off.)
On 21.04.2017 21:00, Navtej Singh wrote:
>
> With RandomizedDelaySec systemd will spread the execution over a period
> thus preventing this sudden rush for memory. This was very helpful.
I would be wary about relying on randomization. Random numbers have the
property that every now and then they are all identical.
So I'd consider that to be more of a CPU load-distribution and not as a
fix for the RAM usage.
> Another thing, which might be worth discussion is, collectors should have a
> flag, to save the collected input to a file, and parser could then
> potentially pick from queue or file. This will help in cases where the
> input size is relative large, eg blueliv or alienvault (subscribed to lot
> of pulses, reminds me I need to submit a PR for this enhancement). May be
> some enhancements to fileinput/fileoutput bot can do that, I haven't really
> explored it, however an integrated approach would be much better, imo.
IMHO there are multiple issues:
a) how to pass huge amounts of data between bots
b) how to process larger data-sets
Ad a)
yes, passing a reference to a file (filename?) instead of the content of
the file is one option. It may well be that using a different
Message-Passing backend (e.g. Rabbit-MQ) might also solve the issue.
Ad b)
IMHO much more tricky is the issue of actually processing huge
data-sets. Once you reach file-sizes in the GB range one needs to switch
from "load everything into a data-structure in RAM, then process it" to
a "load next few KB from a data-stream, process it, then get next slice".
My worry is that the current bot API cannot be easily converted to
stream processing.
We need to think this through.
> a. Replace redis as queue with something persistent. As present redis uses
> a lot of memory since it keeps the events in memory. if your feeds are
> getting data frequently and, in the chain, you have a slow processing
> expert, queue size keeps growing and so does the redis memory usage.
Yes.
> b. multiple events processing by single bot, This has been discussed a lot
> in issues and mailing lists. I have an implementation using gevents[2].
> However there are problems with this, those trade-offs I am fine with. c &
> d might help to resolve these issues.
Yes, some experts would be a **lot** more efficient if they can do bulk
processing.
> c. events should have IDs. This will help in acknowledging the correct
> message in case of multi processing wrt to b.
Yes, but for a different reason: Assume more CERTs that do
IntelMQ-IntelMQ cross-connects. You need a way to avoid building
forwarding-loops. Persistent IDs can help (analogue to Message-IDs in
the Usenet context).
(btw the Usenet analogy: some sort of Path: header would also be
helpful: a list of Systems that this event has already passed through.)
otmar
--
// Otmar Lendl <lendl at cert.at> - T: +43 1 5056416 711
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - http://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.cert.at/pipermail/intelmq-dev/attachments/20170426/0c4f4da6/attachment.sig>
More information about the Intelmq-dev
mailing list