[Intelmq-dev] Proposal (Request For Comments) - IntelMQ with Run Modes & Process Management

Tue Apr 25 10:32:36 CEST 2017

Hi Navtej,

Am Freitag 21 April 2017 21:00:11 schrieb Navtej Singh:
> I would like to share some insights from working with intelmq with roughly
> 70 feeds. I have frequently run into these problems and tried to solve
> these on my own.

thanks for adding your experiences and approaches.
I believe in coming up with a number of ideas, trying some and then find a 
good solution, so it is good to see your approaches.

> There are some concerns if systemd is a right solution. I believe it is.
> There are some aspects of systemd which are appealing and helpful. Running
> the bots as intelmq user is a breeze, with User and Group directives.
> However one of the biggest gains is with RandomizedDelaySec directive. 

If we had a process manager that knows how the bots are wired, it could just 
queue some one time collectors behind each other if the insertion point 
before experts is already loaded. So I don't think this is coupled to systemd 
in particular, though the RandomizedDelaySec sounds interesting for some 
simple use cases.

> I understand that I am about to expand the discussion here, however I feel
> it is connected issue. There should be a way to prevent running multiple
> instances of bot with same id. As I see it, collectors and parsers though
> different are tightly coupled.

To me this sounds like a use case that should be considered in this 
discussion. See my other post (a few minuted ago) where I explain why I 
consider this kind of "flow control" relevant with your example.

> a. Replace redis as queue with something persistent. As present redis uses
> a lot of memory since it keeps the events in memory. if your feeds are
> getting data frequently and, in the chain, you have a slow processing
> expert, queue size keeps growing and so does the redis memory usage.

I also consider this a "flow control" issue, stop inserting stuff if the 
downstream pipe is full. Which technically could mean that redis has used the 
configured memory.

> b. multiple events processing by single bot, This has been discussed a lot
> in issues and mailing lists. I have an implementation using gevents[2].
> However there are problems with this, those trade-offs I am fine with. c &
> d might help to resolve these issues.

Can you point me to a more elaborate outline of the problem?
(I always thought that a bot can already process several events, but you mean 
per network event?)

> c. events should have IDs. This will help in acknowledging the correct
> message in case of multi processing wrt to b.

My mental model tells me that the information about an abuse sighting is the 
same, it shall be the "same" for intelmq, so an ID wouldn't help. Somehow 
intelmq must record the contents of the "events" and deduplicate anyway.

> d. bots should be able to peek at message count in the source queue. This
> will help with b. as well as backoff algorithm discussed at other places,
> iirc  Sebastian proposed it on some github issues. this really simple, I
> had written the peek function however I cannot locate it as of now.

This sounds like the bots implementing some "flow control" itself.
From a design perspective I think the bot shall known and somehow register
what it wants to do or handle, however the control seems feasable from an 
oversight process from my perspective.

Best Regards,
Bernhard

-- 
www.intevation.de/~bernhard   +49 541 33 508 3-3
Intevation GmbH, Osnabrück, DE; Amtsgericht Osnabrück, HRB 18998
Geschäftsführer Frank Koormann, Bernhard Reiter, Dr. Jan-Oliver Wagner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.cert.at/pipermail/intelmq-dev/attachments/20170425/8f720a39/attachment.sig>