Hi Navtej,
Am Freitag 21 April 2017 21:00:11 schrieb Navtej Singh:
I would like to share some insights from working with intelmq with roughly 70 feeds. I have frequently run into these problems and tried to solve these on my own.
thanks for adding your experiences and approaches. I believe in coming up with a number of ideas, trying some and then find a good solution, so it is good to see your approaches.
There are some concerns if systemd is a right solution. I believe it is. There are some aspects of systemd which are appealing and helpful. Running the bots as intelmq user is a breeze, with User and Group directives. However one of the biggest gains is with RandomizedDelaySec directive.
If we had a process manager that knows how the bots are wired, it could just queue some one time collectors behind each other if the insertion point before experts is already loaded. So I don't think this is coupled to systemd in particular, though the RandomizedDelaySec sounds interesting for some simple use cases.
I understand that I am about to expand the discussion here, however I feel it is connected issue. There should be a way to prevent running multiple instances of bot with same id. As I see it, collectors and parsers though different are tightly coupled.
To me this sounds like a use case that should be considered in this discussion. See my other post (a few minuted ago) where I explain why I consider this kind of "flow control" relevant with your example.
a. Replace redis as queue with something persistent. As present redis uses a lot of memory since it keeps the events in memory. if your feeds are getting data frequently and, in the chain, you have a slow processing expert, queue size keeps growing and so does the redis memory usage.
I also consider this a "flow control" issue, stop inserting stuff if the downstream pipe is full. Which technically could mean that redis has used the configured memory.
b. multiple events processing by single bot, This has been discussed a lot in issues and mailing lists. I have an implementation using gevents[2]. However there are problems with this, those trade-offs I am fine with. c & d might help to resolve these issues.
Can you point me to a more elaborate outline of the problem? (I always thought that a bot can already process several events, but you mean per network event?)
c. events should have IDs. This will help in acknowledging the correct message in case of multi processing wrt to b.
My mental model tells me that the information about an abuse sighting is the same, it shall be the "same" for intelmq, so an ID wouldn't help. Somehow intelmq must record the contents of the "events" and deduplicate anyway.
d. bots should be able to peek at message count in the source queue. This will help with b. as well as backoff algorithm discussed at other places, iirc Sebastian proposed it on some github issues. this really simple, I had written the peek function however I cannot locate it as of now.
This sounds like the bots implementing some "flow control" itself. From a design perspective I think the bot shall known and somehow register what it wants to do or handle, however the control seems feasable from an oversight process from my perspective.
Best Regards, Bernhard