[Intelmq-users] Bots stopping
Sebastian Wagner
wagner at cert.at
Thu Oct 11 13:30:56 CEST 2018
Hi,
The traceback in the email shows an exception in a custom bot. Without
the code, it's hard to say what's going on.
The exceptions attached do contain the following redis error message:
> redis.exceptions.BusyLoadingError: Redis is loading the dataset in memory
Looks like redis is just starting. In this case we could wait (up to a
maximum time) as long as this error occurs and then continue. Still
requires someone to implement it.
I opened https://github.com/certtools/intelmq/issues/1334 for it.
Sebastian
On 11/10/2018 08.47, Vaclav Bruzek wrote:
> Hi Sebastian,
> I've added the Redis exception to the attachment. That is the case
> that I would expect that the bot would keep trying to connect to Redis
> and not give up and exit.
>
> I use continuous run mode for all bots.
>
> I've also extracted the example of the other behaviour, that is
> exiting without logging that the bot stopped. That is indeed what I
> meant (your last point), that the bot logs the exception but doesn't
> log the line "Bot stopped" and stops, which is what status check is
> reproting.
> ¨
> 2018-10-02 02:43:13,744 - output - ERROR - Bot has found a problem.
> Traceback (most recent call last):
> File
> "/usr/local/lib/python3.6/dist-packages/intelmq-1.1.0-py3.6.egg/intelmq/lib/bot.py",
> line 167, in start
> self.process()
> File
> "/usr/local/lib/python3.6/dist-packages/intelmq-1.1.0-py3.6.egg/intelmq/bots/outputs/bot/output.py",
> line 67, in process
> status = self.db_check()
> File
> "/usr/local/lib/python3.6/dist-packages/intelmq-1.1.0-py3.6.egg/intelmq/bots/outputs/bot/output.py",
> line 53, in db_check
> payload = self.connection_blacklist.get(key).decode("utf-8",
> errors="ignore")
> AttributeError: 'NoneType' object has no attribute 'decode'
> 2018-10-02 02:43:13,744 - output - INFO - Current Message(event):
> {"some event"}.
> 2018-10-02 02:43:13,745 - output - INFO - Bot will continue in 0 seconds.
> 2018-10-02 02:43:35,997 - whitelist-output - ERROR - Bot has found a
> problem.
> Traceback (most recent call last):
>
> AttributeError: 'NoneType' object has no attribute 'decode'
> 2018-10-02 02:43:35,998 - whitelist-output - INFO - Current
> Message(event): {'feed.accuracy': 100.0, 'feed.name
> <http://feed.name>': 'whalebone', 'feed.url':
> 'http://wb-whitelist.azurewebsites.net/whitelist.txt',
> 'time.observation': '2018-10-02T02:41:58+00:00', 'source.fqdn':
> 'com.bd <http://com.bd>', 'raw': 'Y29tLmJkDQo='}.
> 2018-10-02 02:43:35,998 - whitelist-output - INFO - Dumping message
> from pipeline to dump file.
>
>
> Sincerely,
> Václav Brůžek
>
>
> On Wed, 10 Oct 2018 at 15:36, Sebastian Wagner <wagner at cert.at
> <mailto:wagner at cert.at>> wrote:
>
> Hi Václav,
>
> I can't estimate the implications of the docker usage on redis and
> intelmq.
>
> Concerning the redis problem: There were no changes in the code
> handling redis problems and the only case when intelmq's bots do
> not log anything is when there are not enough resources to
> shutdown cleanly (memory, disk). Even then, there's output on
> stdout. You could log stdout and see if there are any errors shown
> at the end.
>
> Concerning the error handling and sudden stops: There haven't been
> code changes too. Do you use the scheduled run mode? If the
> error_procedure is pass and there are pipeline problems, the bot
> stops (in bot.py search for "error_procedure: pass and pipeline
> problem"). AFAIR the reasoning for this was/is that if the bot
> would not stop, the pipeline would be kind of DOS'ed. But as
> problems with memory and snapshots in redis are handled better
> now, that could be relaxed. I'll do some experiments.
>
> Concerning "encounters an exception and logs nothing but status
> check reports that the bot is not running": How do you know that
> the bot encountered an exception if nothing is logged? Is the bot
> then still running or not?
>
> Sebastian
>
> On 09/10/2018 12.58, Vaclav Bruzek wrote:
>> Hi,
>> no there are no modification to the intelmq code. The situation
>> occurs at my custom bots as well as the default ones. As an
>> example of this behaviour: recently Redis broker wasn't available
>> for some time, as a result almost all bots stopped without any
>> log message indicating that the bot stopped.
>>
>> Sincerely,
>> Václav Brůžek
>>
>>
>> On Tue, 9 Oct 2018 at 12:01, Sebastian Wagner <wagner at cert.at
>> <mailto:wagner at cert.at>> wrote:
>>
>> Hi,
>>
>> I didn't know of any problems yet. Do you use any custom
>> modifications in the code? If yes, which?
>>
>> Sebastian
>>
>> On 09/10/2018 10.42, Vaclav Bruzek wrote:
>>> Hi,
>>> since upgrading to version 1.1.0 it became quite a big
>>> problem the stability of bots. Often it happens that bot
>>> encounters an exception and logs that the bot is stopped or
>>> encounters an exception and logs nothing but status check
>>> reports that the bot is not running. I'm using the
>>> 'error_procedure' parameter set to 'pass'
>>> (with error_max_retries and error_retry_delay set to 0) and
>>> I've always thought that this is a sort of 'run forever'
>>> parameter that even when exception occurs the bot will keep
>>> on doing its job. I'm using intelmq in Docker environment
>>> with ubuntu 18.04 as base.
>>>
>>> Sincerely,
>>> Václav Brůžek
>>>
>> --
>> // Sebastian Wagner <wagner at cert.at> <mailto:wagner at cert.at> - T: +43 1 5056416 7201
>> // CERT Austria - https://www.cert.at/
>> // Eine Initiative der nic.at <http://nic.at> GmbH - https://www.nic.at/
>> // Firmenbuchnummer 172568b, LG Salzburg
>>
> --
> // Sebastian Wagner <wagner at cert.at> <mailto:wagner at cert.at> - T: +43 1 5056416 7201
> // CERT Austria - https://www.cert.at/
> // Eine Initiative der nic.at <http://nic.at> GmbH - https://www.nic.at/
> // Firmenbuchnummer 172568b, LG Salzburg
>
--
// Sebastian Wagner <wagner at cert.at> - T: +43 1 5056416 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cert.at/pipermail/intelmq-users/attachments/20181011/9293d0c9/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.cert.at/pipermail/intelmq-users/attachments/20181011/9293d0c9/attachment.sig>
More information about the Intelmq-users
mailing list