[IntelMQ-dev] Bot behaviour in case of unrecoverable errors

Fri Feb 12 21:45:35 CET 2021

Dear Mika,

Sorry for the late response. I have seen the mail, but postponed
answering to later and then I forgot...

On 2/2/21 1:14 PM, Mika Silander wrote:
> Trying to assess what safeguards are sufficient: what happens when a bot has some internal failure and it "dies"?

IntelMQ has an internal error handling, so a thrown exception, e.g. in
the bot's process() method does not lead to the bot dying. Documentation
on this can be found at
https://intelmq.readthedocs.io/en/latest/user/configuration-management.html#error-handling

Please let us know if information is missing there so we can improve it.

> Will intelmq restart the bot automatically or will it be up to the admin of intelmq to manually restart it?

Currently there is no such automatism by default. IntelMQ has as of now
no watcher/supervising daemon itself, but we have

- integration into supervisord:
https://intelmq.readthedocs.io/en/latest/user/configuration-management.html#using-supervisor-as-process-manager-beta
- and a script to generate systemd service files for bots:
https://github.com/certtools/intelmq/tree/develop/contrib/systemd (and
as I am reminded just now that is really badly documented)

> And if automatic restarts is the norm, how could one stop the bot from processing new incoming messages if say,
> X consecutive failures like these have happened within the time frame of the last 5 minutes?

The error handling takes care of that. By default, the bot tries to
process a message up to three times and then gives up on this one, dumps
it to disk for further inspection of the administrator, and continues
with the next message. The erroneous message is removed from the queue.

For parsers you can reduce the parameter error_max_retries, as they
don't depend on external resources and temporary failures can't happen.
For experts which make external lookups, retries are perfectly fine.

For more information on the dumping functionality and how to process
these dumps, see
https://intelmq.readthedocs.io/en/latest/user/configuration-management.html#tool-intelmqdump

> By writing some log
> entries at bot startup and then making the bot itself analyze the log at every restart? 
>
>  I'm trying to make sure a burst of erroneous/malformed events are not accidentally forwarded by a malfunctioning
> or partially functioning bot.

That won't happen, except if you explicitly configure IntelMQ to do so.

Hope that helps. If it doesn't - don't dare to ask :)

best regards
Sebastian

-- 
// Sebastian Wagner <wagner at cert.at> - T: +43 1 5056416 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.cert.at/pipermail/intelmq-dev/attachments/20210212/04d166d4/attachment.sig>