Moin,
Am Freitag 21 April 2017 12:26:12 schrieb Sebastian Wagner:
https://github.com/wagner-certat/intelmq/blob/proposal/docs/proposal.md It's also an outcome of the same discussion, with some differences and simplifications. But it tackles less issues.
what issues are we trying to solve? My feeling increased that it would help us, if we pull together the problems we are trying to solve and write them down. Even if it is just bullet-points or keywords.
The idea is to keep intelmq simple (which is one of the design goals from the beginnings of this projects) and use existing and well-known tools instead of implementing our own bunch of bugs.
I agree with this design goal. As simple as possible, but not simpler than necessary.
If we want intelmq simple, my strong recommendation is: a) implement (and thus support) only one process management solution. So if a proposal including systemd is considered the leading solution after the discussion, we should implement it and remove other process management approaches. b) We should reduce the number of run modes as much as we can, as it adds complexity in thinking and coding. If all processes shall be "bots", then bots itself should decide how often they run. It should be within their code. So my idea, briefly outlined, may actually be the simpler solution. (There may be other ideas as well.) c) remove the "reload" option. So far I think the potential benefits are outweigthed by the cost.
However again I'm unsure if systemd is a good solution.
Personally, I want to keep the PID-based approach and encourage developers to provide support for supervisord etc. Making systemd a hard-requirement is not *my* intention. However, my idea and intention is to not implement the process management ourselves, but maybe we can't avoid it because of:
As for a) above: we should only have to maintain and think through one solution. I think it is okay to make systemd a hard requirement if that is the leading implementation idea after scrutinising a number of ideas.
I really like using other components, however not at all costs. The component to be used must fit quite well, otherwise the break-even point is easily reached, where learning, adaption and adopting a different component is more work than rolling your own.
My second concern regarding systemd is more important: systemd is a general process manager, for intelmq we may need a process manager that implements tactics specific to abuse handling and intelmq's design. For example about the question: What shall be done if we get congestion in intelmq? Should starting of scheduled bots runs be queued up or dropped? Which ones are more important than others? My gut feeling is that we will need some sort of flow control and that systemd will be unable to manage that without additional stunts.
Flow control is definitely an issue and a big topic we should discuss in depth. We (certat) do not need flow control currently but maybe you do?
What I mean by flow control is that we take the relations between the bots into account and implement strategies and tactics based on intelmq specific information. In Navtej's friday post you can see how he makes use of this information and proposes improved solutions to steer the flow within intelmq. To me it feels like an intelmq process manager can do this much better, because it already know how the pipes are wired together.
Sooner or later I guess intelmq will need this kind of "flow control" to be able to fulfill its promise of providing a fast and fully automatable system. So it may become interesting to you at certat as well. :)
In our test runs we did see a number of congestions, partly due to some other defects or suboptimal configurations, but observing how the system recovers from these situations, it can certainly do much better.
The "reload" action is unclear to me, why making things more complicated? From my perspective doing more than "restart"
Does "reload" more than "restart"? AFAIU, they are performing the same checks. The only difference is, that restart stops/starts the running continuous bots, and reload sends sighup to those.
If it does not do more, get rid of it. (I thought it aims for doing more, but then bots would need to be prepared to flush some of their datastructure while running. It is much simpler for bot writers to just write for stop and start.)
Best Regards, Bernhard