Dear Mika,

On 3/23/22 1:27 PM, Mika Silander wrote:
 Calling out to the list again before trying to reinvent a wheel. We have a collector that tries to connect to the source of security events, but the source service is (temporarily) inaccessible. IntelMQ restarts the bot, an error message and the connection failure exception gets output to the log, and after a while, the bot gets restarted. For us it is acceptable there's a few failures in a configurable time frame, but if the situation prevails, we'd like to be alerted and if possible, prevent the bot from restarting.

 There are bot configuration parameters to stop a bot in case the processing of events fails repeatedly (error_procedure=stop, error_max_retries=suitable_number as in https://intelmq.readthedocs.io/en/maintenance/user/configuration-management.html#id14), but after some experimenting it appears they are not applicable here since we never reach the stage of processing events.

If the fetch fails, an exception in thrown in process(). That enters the error handling in start() as described in the documentation. But collectors are a bit different to parsers, experts and outputs, as they don't have an incoming message as trigger, but operation on their own, e.g. rate limiting.

Reading the code, I think your bot should stop, as error_on_message is active in this case, which increases the error counter. When the counter is hit, it should stop if error_procedure == "stop".

 Any ideas how to address this issue? We'd prefer not to touch the bot's source code.

IntelMQ by itself does not notify you of any failures. I have used the logcheck rules to monitor IntelMQ's logs: https://github.com/certtools/intelmq/tree/develop/contrib/logcheck While that's not a perfect solution, it is super easy. For anything more advanced, an integration into a monitoring would be feasible.

kind regards
Sebastian