Hi,

The traceback in the email shows an exception in a custom bot. Without the code, it's hard to say what's going on.

The exceptions attached do contain the following redis error message:
> redis.exceptions.BusyLoadingError: Redis is loading the dataset in memory

Looks like redis is just starting. In this case we could wait (up to a maximum time) as long as this error occurs and then continue. Still requires someone to implement it.
I opened https://github.com/certtools/intelmq/issues/1334 for it.

Sebastian

On 11/10/2018 08.47, Vaclav Bruzek wrote:
Hi Sebastian,
I've added the Redis exception to the attachment. That is the case that I would expect that the bot would keep trying to connect to Redis and not give up and exit. 

I use continuous run mode for all bots. 

I've also extracted the example of the other behaviour, that is exiting without logging that the bot stopped. That is indeed what I meant (your last point), that the bot logs the exception but doesn't log the line "Bot stopped" and stops, which is what status check is reproting.
¨
2018-10-02 02:43:13,744 - output - ERROR - Bot has found a problem.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/intelmq-1.1.0-py3.6.egg/intelmq/lib/bot.py", line 167, in start
    self.process()
  File "/usr/local/lib/python3.6/dist-packages/intelmq-1.1.0-py3.6.egg/intelmq/bots/outputs/bot/output.py", line 67, in process
    status = self.db_check()
  File "/usr/local/lib/python3.6/dist-packages/intelmq-1.1.0-py3.6.egg/intelmq/bots/outputs/bot/output.py", line 53, in db_check
    payload = self.connection_blacklist.get(key).decode("utf-8", errors="ignore")
AttributeError: 'NoneType' object has no attribute 'decode'
2018-10-02 02:43:13,744 - output - INFO - Current Message(event): {"some event"}.
2018-10-02 02:43:13,745 - output - INFO - Bot will continue in 0 seconds.
2018-10-02 02:43:35,997 - whitelist-output - ERROR - Bot has found a problem.
Traceback (most recent call last):

AttributeError: 'NoneType' object has no attribute 'decode'
2018-10-02 02:43:35,998 - whitelist-output - INFO - Current Message(event): {'feed.accuracy': 100.0, 'feed.name': 'whalebone', 'feed.url': 'http://wb-whitelist.azurewebsites.net/whitelist.txt', 'time.observation': '2018-10-02T02:41:58+00:00', 'source.fqdn': 'com.bd', 'raw': 'Y29tLmJkDQo='}.
2018-10-02 02:43:35,998 - whitelist-output - INFO - Dumping message from pipeline to dump file.


Sincerely,
Václav Brůžek


On Wed, 10 Oct 2018 at 15:36, Sebastian Wagner <wagner@cert.at> wrote:

Hi Václav,

I can't estimate the implications of the docker usage on redis and intelmq.

Concerning the redis problem: There were no changes in the code handling redis problems and the only case when intelmq's bots do not log anything is when there are not enough resources to shutdown cleanly (memory, disk). Even then, there's output on stdout. You could log stdout and see if there are any errors shown at the end.

Concerning the error handling and sudden stops: There haven't been code changes too. Do you use the scheduled run mode? If the error_procedure is pass and there are pipeline problems, the bot stops (in bot.py search for "error_procedure: pass and pipeline problem"). AFAIR the reasoning for this was/is that if the bot would not stop, the pipeline would be kind of DOS'ed. But as problems with memory and snapshots in redis are handled better now, that could be relaxed. I'll do some experiments.

Concerning "encounters an exception and logs nothing but status check reports that the bot is not running": How do you know that the bot encountered an exception if nothing is logged? Is the bot then still running or not?

Sebastian

On 09/10/2018 12.58, Vaclav Bruzek wrote:
Hi,
no there are no modification to the intelmq code. The situation occurs at my custom bots as well as the default ones. As an example of this behaviour: recently Redis broker wasn't available for some time, as a result almost all bots stopped without any log message indicating that the bot stopped.

Sincerely,
Václav Brůžek


On Tue, 9 Oct 2018 at 12:01, Sebastian Wagner <wagner@cert.at> wrote:

Hi,

I didn't know of any problems yet. Do you use any custom modifications in the code? If yes, which?

Sebastian

On 09/10/2018 10.42, Vaclav Bruzek wrote:
Hi,
since upgrading to version 1.1.0 it became quite a big problem the stability of bots. Often it happens that bot encounters an exception and logs that the bot is stopped or encounters an exception and logs nothing but status check reports that the bot is not running. I'm using the 'error_procedure' parameter set to 'pass' (with error_max_retries and error_retry_delay set to 0) and I've always thought that this is a sort of 'run forever' parameter that even when exception occurs the bot will keep on doing its job. I'm using intelmq in Docker environment with ubuntu 18.04 as base.

Sincerely,
Václav Brůžek

-- 
// Sebastian Wagner <wagner@cert.at> - T: +43 1 5056416 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg
-- 
// Sebastian Wagner <wagner@cert.at> - T: +43 1 5056416 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg
-- 
// Sebastian Wagner <wagner@cert.at> - T: +43 1 5056416 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg