Hi,
On 7/30/21 11:22 AM, Mika Silander wrote:
I didn't tell the name of our ticketing system :-).
IntelMQ only has a collector for one ticketing system, so I assumed you are using that one.
Anyway, I was referring to stopping the entire intelmq bot infra on a server. I think it is better not to require the admins to remember to stop one specific bot before server maintenance/reboots etc. The ultimate solution in our case is to switch to a ticketing system that truly supports transactions (in the db sense).
Having a meeting is fine, although I fear I haven't very much to offer apart from the idea of using signals the way I already explained. The cleanest other option for implementing graceful shutdown could be to have the bots accept some control message ("graceful shutdown") sent out by intelmqctl to all bots, but the risks of this turning into over-engineering and/or KISS violation are high, right?
That would require some control-channel for sending commands. Maybe we need to rely more on threading anyway, because the main thread catches all the signals, and can handle them gracefully, while worker threads could continue their work uninterrupted. And that already exists with out multi-threading feature, which I implemented a while ago. E.g. the main thread handles SIGHUP and sets a event for all the instances: https://github.com/certtools/intelmq/blob/343432f7aea7a59a578762aa961b635a03... Extending that might mean some cleanup and always run in threading mode, also if it's not necessary to do so (having only 1 instance). But in principle, that's already kind of a PoC for such an implementation.
best regards Sebastian
Br, Mika
----- Original Message ----- From: "Sebastian Wagner" wagner@cert.at To: "Mika Silander" mika.silander@csc.fi, "intelmq-dev" intelmq-dev@lists.cert.at Sent: Friday, 30 July, 2021 11:17:55 Subject: Re: [IntelMQ-dev] Preventing lost events when stopping intelmq
Hi Mika,
On 7/26/21 9:18 AM, Mika Silander wrote:
Back from short holidays now, thanks for the answer. The reason to my question was not actually related to intelmq but to the ticketing system we have behind intelmq. This ticketing system will end up having inconsistent information if intelmq is stopped in the midst of event processing and I'd like to minimize the likelihood of this happening. People familiar with this ticketing system would rightfully argue the system itself is nothing but an inconsistency, but that's another story.
Do you refer to the interruption of the RT collector alone, or of the IntelMQ Instance in total?
Continuing with the idea of using signals: would it be possible to implement a signal handling routine (for another signal than kill) that cleanly shuts down a bot if it is not processing an event? And if it is processing, set a flag so that once processing is finished, the bot will shutdown? Still, if I'm not mistaken, Linux doesn't guarantee the delivery of signals so even this approach isn't foolproof.
I think so, yes. But I'm not an expert with Linux' and Python's signal handling and I have already misunderstood it in the past.
See also: https://github.com/certtools/intelmq/issues/1247
We can also do a short meeting on this topic, if you'd like. Is anyone else also interested in signals/graceful shutdowns?
Sebastian
One can envision other approaches to implement shutdown functionality but they all tend to violate the KISS principle.
Br, Mika
----- Original Message ----- From: "Sebastian Wagner" wagner@cert.at To: "Mika Silander" mika.silander@csc.fi, "intelmq-dev" intelmq-dev@lists.cert.at Sent: Friday, 2 July, 2021 17:24:31 Subject: Re: [IntelMQ-dev] Preventing lost events when stopping intelmq
Hi Mika,
On 7/1/21 3:05 PM, Mika Silander wrote:
Returning to a similar issue but from a different angle: for maintenance I'd like to be able to cleanly shutdown the server running intelmq. Is there a way to guarantee that none of the bots is in a processing state (i.e. processing an event in the process method) before server shutdown? Can "intelmqctl stop" for example stop the bot chain in such a way that none of the bots is in the midst of processing an event? If not, what would be the best approach for achieving this?
I have two answer to offer:
The kill signal is destructive and interrupts syscalls. So after the reception, the bot cannot just continue where it stopped. As far as I know it's currently not possible to circumvent this except for threading, where the main thread receives the signal and then could wait for the other threads finish processing. Would be a cool feature :) Related feature request (of myself): https://github.com/certtools/intelmq/issues/1298
The other answer is: You may simply ignore this. You won't loose any data, as the message on the input side is only deleted after the message is processed completely and sent to the next queue. But you can end up with messages being duplicated, especially if you kill a parser which is just parsing a large report. It could happen for all bots in principle, if you kill them after they sent the message and just before they acknowledged it - but I consider that very improbable. You can prevent this by placing another deduplicator just before your output bot(s).
I assume these are not the answers you were looking for and hope they don't spoil your mood just before the weekend =)
best regards Sebastian