Hi Sebastian, I've added the Redis exception to the attachment. That is the case that I would expect that the bot would keep trying to connect to Redis and not give up and exit.
I use continuous run mode for all bots.
I've also extracted the example of the other behaviour, that is exiting without logging that the bot stopped. That is indeed what I meant (your last point), that the bot logs the exception but doesn't log the line "Bot stopped" and stops, which is what status check is reproting. ¨ 2018-10-02 02:43:13,744 - output - ERROR - Bot has found a problem. Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/intelmq-1.1.0-py3.6.egg/intelmq/lib/bot.py", line 167, in start self.process() File "/usr/local/lib/python3.6/dist-packages/intelmq-1.1.0-py3.6.egg/intelmq/bots/outputs/bot/output.py", line 67, in process status = self.db_check() File "/usr/local/lib/python3.6/dist-packages/intelmq-1.1.0-py3.6.egg/intelmq/bots/outputs/bot/output.py", line 53, in db_check payload = self.connection_blacklist.get(key).decode("utf-8", errors="ignore") AttributeError: 'NoneType' object has no attribute 'decode' 2018-10-02 02:43:13,744 - output - INFO - Current Message(event): {"some event"}. 2018-10-02 02:43:13,745 - output - INFO - Bot will continue in 0 seconds. 2018-10-02 02:43:35,997 - whitelist-output - ERROR - Bot has found a problem. Traceback (most recent call last):
AttributeError: 'NoneType' object has no attribute 'decode' 2018-10-02 02:43:35,998 - whitelist-output - INFO - Current Message(event): {'feed.accuracy': 100.0, 'feed.name': 'whalebone', 'feed.url': ' http://wb-whitelist.azurewebsites.net/whitelist.txt', 'time.observation': '2018-10-02T02:41:58+00:00', 'source.fqdn': 'com.bd', 'raw': 'Y29tLmJkDQo='}. 2018-10-02 02:43:35,998 - whitelist-output - INFO - Dumping message from pipeline to dump file.
Sincerely, Václav Brůžek
On Wed, 10 Oct 2018 at 15:36, Sebastian Wagner wagner@cert.at wrote:
Hi Václav,
I can't estimate the implications of the docker usage on redis and intelmq.
Concerning the redis problem: There were no changes in the code handling redis problems and the only case when intelmq's bots do not log anything is when there are not enough resources to shutdown cleanly (memory, disk). Even then, there's output on stdout. You could log stdout and see if there are any errors shown at the end.
Concerning the error handling and sudden stops: There haven't been code changes too. Do you use the scheduled run mode? If the error_procedure is pass and there are pipeline problems, the bot stops (in bot.py search for "error_procedure: pass and pipeline problem"). AFAIR the reasoning for this was/is that if the bot would not stop, the pipeline would be kind of DOS'ed. But as problems with memory and snapshots in redis are handled better now, that could be relaxed. I'll do some experiments.
Concerning "encounters an exception and logs nothing but status check reports that the bot is not running": How do you know that the bot encountered an exception if nothing is logged? Is the bot then still running or not?
Sebastian On 09/10/2018 12.58, Vaclav Bruzek wrote:
Hi, no there are no modification to the intelmq code. The situation occurs at my custom bots as well as the default ones. As an example of this behaviour: recently Redis broker wasn't available for some time, as a result almost all bots stopped without any log message indicating that the bot stopped.
Sincerely, Václav Brůžek
On Tue, 9 Oct 2018 at 12:01, Sebastian Wagner wagner@cert.at wrote:
Hi,
I didn't know of any problems yet. Do you use any custom modifications in the code? If yes, which?
Sebastian On 09/10/2018 10.42, Vaclav Bruzek wrote:
Hi, since upgrading to version 1.1.0 it became quite a big problem the stability of bots. Often it happens that bot encounters an exception and logs that the bot is stopped or encounters an exception and logs nothing but status check reports that the bot is not running. I'm using the 'error_procedure' parameter set to 'pass' (with error_max_retries and error_retry_delay set to 0) and I've always thought that this is a sort of 'run forever' parameter that even when exception occurs the bot will keep on doing its job. I'm using intelmq in Docker environment with ubuntu 18.04 as base.
Sincerely, Václav Brůžek
-- // Sebastian Wagner wagner@cert.at wagner@cert.at - T: +43 1 5056416 7201 // CERT Austria - https://www.cert.at/ // Eine Initiative der nic.at GmbH - https://www.nic.at/ // Firmenbuchnummer 172568b, LG Salzburg
--
// Sebastian Wagner wagner@cert.at wagner@cert.at - T: +43 1 5056416 7201 // CERT Austria - https://www.cert.at/ // Eine Initiative der nic.at GmbH - https://www.nic.at/ // Firmenbuchnummer 172568b, LG Salzburg