[IntelMQ-dev] Bot behaviour in case of unrecoverable errors
Mika Silander
mika.silander at csc.fi
Tue Feb 16 11:58:41 CET 2021
Hi Sebastian,
Thanks for answering. I've been busy with other things so np with a delayed answer.
What comes to my question on how to react when a bot "dies", I see the question should be rephrased
as "how to react to exceptions in a bot?". The error handling URL below suggests
that I could set the parameter error_procedure (+ error_max_retries + error_retry_delay), and that should cover
what is needed especially if a restart after this requires manual operation.
And I can discard the (elaborate) option of making the bot always analyze its own log entries
(to discover repetitive failures/exceptions) at startup.
Cheers, Mika
----- Original Message -----
From: "Sebastian Wagner" <wagner at cert.at>
To: "Mika Silander" <mika.silander at csc.fi>, "intelmq-dev" <intelmq-dev at lists.cert.at>
Sent: Friday, 12 February, 2021 22:45:35
Subject: Re: [IntelMQ-dev] Bot behaviour in case of unrecoverable errors
Dear Mika,
Sorry for the late response. I have seen the mail, but postponed
answering to later and then I forgot...
On 2/2/21 1:14 PM, Mika Silander wrote:
> Trying to assess what safeguards are sufficient: what happens when a bot has some internal failure and it "dies"?
IntelMQ has an internal error handling, so a thrown exception, e.g. in
the bot's process() method does not lead to the bot dying. Documentation
on this can be found at
https://intelmq.readthedocs.io/en/latest/user/configuration-management.html#error-handling
Please let us know if information is missing there so we can improve it.
> Will intelmq restart the bot automatically or will it be up to the admin of intelmq to manually restart it?
Currently there is no such automatism by default. IntelMQ has as of now
no watcher/supervising daemon itself, but we have
- integration into supervisord:
https://intelmq.readthedocs.io/en/latest/user/configuration-management.html#using-supervisor-as-process-manager-beta
- and a script to generate systemd service files for bots:
https://github.com/certtools/intelmq/tree/develop/contrib/systemd (and
as I am reminded just now that is really badly documented)
> And if automatic restarts is the norm, how could one stop the bot from processing new incoming messages if say,
> X consecutive failures like these have happened within the time frame of the last 5 minutes?
The error handling takes care of that. By default, the bot tries to
process a message up to three times and then gives up on this one, dumps
it to disk for further inspection of the administrator, and continues
with the next message. The erroneous message is removed from the queue.
For parsers you can reduce the parameter error_max_retries, as they
don't depend on external resources and temporary failures can't happen.
For experts which make external lookups, retries are perfectly fine.
For more information on the dumping functionality and how to process
these dumps, see
https://intelmq.readthedocs.io/en/latest/user/configuration-management.html#tool-intelmqdump
> By writing some log
> entries at bot startup and then making the bot itself analyze the log at every restart?
>
> I'm trying to make sure a burst of erroneous/malformed events are not accidentally forwarded by a malfunctioning
> or partially functioning bot.
That won't happen, except if you explicitly configure IntelMQ to do so.
Hope that helps. If it doesn't - don't dare to ask :)
best regards
Sebastian
--
// Sebastian Wagner <wagner at cert.at> - T: +43 1 5056416 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg
More information about the IntelMQ-dev
mailing list