Morning all,
Ignore my two messages from yesterday. Signaling works fine, the problem in our case is the very long wait between the signaling and when the reload is effectuated. This morning I had a workable interleaving of events such that it was easy to see. From my point of view it would be handy if all bots could handle the reload also if they happen to be idle waiting for new incoming events since that would make the delay between signaling and reloading short. However, even that does leave a small time window where one of our monitoring scripts could be working on void log file handles. So, looks like we have to devise a way to detect those void handles in our monitoring scripts.
Br, Mika
----- Original Message ----- From: "Mika Silander" mika.silander@csc.fi To: "intelmq-dev" intelmq-dev@lists.cert.at Sent: Monday, 23 October, 2023 16:48:45 Subject: Re: [IntelMQ-dev] Expected behaviour of bot reloading after log rotation
... and one small addition to the below: it appears psutil 5.9.0 invokes os.kill() internally, and this latter succeeds, but no signal is delivered.
Br, Mika
----- Original Message ----- From: "Mika Silander" mika.silander@csc.fi To: "intelmq-dev" intelmq-dev@lists.cert.at Sent: Monday, 23 October, 2023 16:37:17 Subject: Re: [IntelMQ-dev] Expected behaviour of bot reloading after log rotation
Hi Sebastian, all,
Started the week debugging. First I answer my question earlier about "expected [signaling] behaviour" with respect to reloading:
"Yes, all the bots in the configured botnet are sent a SIGHUP signal one-by-one when intelmqctl reload is run."
Then the bad news: I see SIGHUP successfully sent to the Mail Attachment Collector bot _only_. Why on earth does the signaling of the other bots fail, I don't have a clue. The other bots are sent the same signal, but apparently it does not get delivered. All PIDs were correct. So, the problem must be inside psutil 5.9.0 that offers POSIX signaling.
If someone is on a
Ubuntu 22.04 LTS Python 3.10.12 intelmq 3.2.+ psutil 5.9.0 (from python3-psutil apt package)
I would appreciate to know what is the outcome of your invocation of (and running as root):
sudo -u intelmq /usr/bin/intelmqctl reload
and let me know what hits you get with
grep SIGHUP /var/log/intelmq/*.log
after the reload if any (you may need to adapt the log file paths based on your particular installation of intelmq).
Thank you and br, Mika
----- Original Message ----- From: "Sebix" sebix@sebix.at To: "intelmq-dev" intelmq-dev@lists.cert.at, "Mika Silander" mika.silander@csc.fi Sent: Thursday, 19 October, 2023 22:19:17 Subject: Re: [IntelMQ-dev] Expected behaviour of bot reloading after log rotation
Hi Mika
To explain the behavior (partly): Signals have an interrupting nature on connections to redis, sleep()[0], and other things. The interrupt itself is caught by the method __handle_sighup_signal.
Image a parser bot is processing a (potentially large) report. Then, logrotate sends the SIGHUP signal. If the parser now reloaded, the Bot needed to re-initialize, interrupting the processing. There are also other circumstances when a re-initialization is "sub-optimal".
Thus, handling the signal is postponed until it is safe to do so, for example, after every process call and after receiving a message from the pipeline.
This logic can be revisited now. The core received quite a few updates since the implementation of reloading, and there are possibly more places in the Bot's code where the reloading can be triggered if necessary.
[0]: That's why e.g. Bot.__sleep, Bot.receive_message contain while-loops, in case a SIGHUP interrupted them
Sebastian
Institute for Common Good Technology gemeinnütziger Kulturverein - nonprofit cultural society https://commongoodtechnology.org/ ZVR 1510673578
On 10/19/23 14:35, Kamil Mankowski via IntelMQ-dev wrote:
Hey,
There is a bot's flag ("_sighup_delay") controlling weather a bot reloads immediate or waits. Looking at the source code, it looks like an expected behaviour, see:
https://github.com/certtools/intelmq/blob/d624988693894e71defdd8b18618f7af98...
and:
https://github.com/certtools/intelmq/blob/d624988693894e71defdd8b18618f7af98...
However, I understand your concern: this is okay if the reload happens during the one day (logrotate keeps the old log file untouched until next run), but it can cause troubles, if there is no message on one day.
Best regards, Kamil Mankowski CERT.at GmbH www.cert.at
On 10/19/23 13:29, Mika Silander wrote:
Hi all,
Apologies for a lengthy message, but I had a set of monitoring scripts that worked like a charm on a physical machine but when moved to a virtual machine with a newer Ubuntu, I see odd failures in the form of monitoring script processes left hanging on void file handles. I suspect the problem is somehow related to log rotation in connection to bot reloading but I can't figure out how. A few questions at the end.
Our setup for reference:
Ubuntu 22.04 LTS python 3.10.12 intelmq 3.2.2a1 intelmq-api 3.1.0-rc1 intelmq-manager 3.2.0
We have a simple bot chain: Mail Attachment Collector -> Shadowserver Parser -> N expert bots -> 1 output bot.
/etc/logrotate.d/intelmq contains:
/var/log/intelmq/*.log { su intelmq intelmq daily maxsize 10M rotate 60 notifempty compress delaycompress create 644 intelmq intelmq sharedscripts postrotate sudo -u intelmq /usr/bin/intelmqctl --quiet reload endscript }
The cron daemon runs run-parts for targets under /etc/cron.daily/ at 06:25, including /etc/cron.daily/logrotate. Thus, "intelmqctl --quiet reload" gets also executed after all the intelmq log files have been rotated.
What I see is that no bot reloads soon after run-parts has finished. The first bot to perform a reload is the Mail Attachment Collector immediately after midnight. The others remain as is until the Mail Attachment Collector reads the day's first report and forwards it to the Shadowserver Parser bot. The parser bot then performs a reload and forwards the events to the first expert in the chain. Once the first event from the parser trickles through the entire bot chain, every bot in turn first performs a reload and then processes the event. In practise, the logrotate triggered bot reload can thus be delayed almost a full day - our first mail reports tend to arrive during the very early morning hours.
My question is, is the above reload behaviour as expected? I found it strange that the Mail Attachment Collector reloads at midnight even though it has processed new mail reports after the "intelmqctl reload" call done by logrotate. I would have expected the Mail Attachment Collector to reload itself when the first report arrives for processing after 06:25, i.e. the first report after the call to intelqmctl reload. And, what controls the timing of the Mail Attachment Collector's reload? All its reloads I see in older log files indicate that the reload is consistently run after midnight irrespective of incoming reports and the timing of the intelmqctl reload command.
Thanks again and best regards from a puzzled Mika
IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/
IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/
_______________________________________________ IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/ _______________________________________________ IntelMQ-dev mailing list https://lists.cert.at/cgi-bin/mailman/listinfo/intelmq-dev https://intelmq.readthedocs.io/