[IntelMQ-dev] storage of config (Re: [Intelmq-dev] IEP01: IntelMQ Configuration Handling)

Fri Jan 22 17:26:51 CET 2021

Hi,

Am Donnerstag 10 Dezember 2020 13:17:45 schrieb Birger Schacht:
> This part is about the question where do we store the
> configuration?.

overall I do miss the use cases or problems 
that should be addressed by the proposed changes.
Having a problem description and links to discussion that have already taken 
place, would make it easier to comment on the proposal.

Some relevant places that describe wishes, status and suggestions:
  https://intelmq.readthedocs.io/en/latest/user/bots.html#common-parameters
  https://intelmq.readthedocs.io/en/latest/user/configuration-management.html
  https://github.com/certtools/intelmq/issues/267
    (Configurations - Hierarchy configurations) closed
  https://github.com/certtools/intelmq/issues/552
    (Enable separate packaging of bots by allowing addition and removals to 
the config)

> The ideas document[^8] on GitHub already proposes to remove the
> pipeline.conf and specifying the destination pipelines in the
> individual bot configuration part.  The declaration of the source
> queue can be dropped then as well, as it follows a rule anyway.

The idea sounds useful, to decrease size of the configuration.
(Making something easier to understand is always a use case.)

> In addition to that, to make the setup of IntelMQ easier, the
> defaults.conf should be dropped. Default values should be set in the
> Bot classes respectively in the IntelMQ process managers, but there
> is no need for a separate file.

The default.conf seems to be used to offer a single place to change
options shared by many bots (e.g. http_user_agent) at once.
If options exist where a common value for a single installation
and their bots is useful the functionality has to be kept somewhere
central. 

I understood the new plave for this would be in a global configuration file,
which contains what default.conf had. This would just be a renaming if there 
weren't other things in the file.

The old pipeline.conf has the wireing, which has a effect which goes beyond 
one bot. As it connects bots, it maybe interessing to have in one place
to check for consitency.

> Another question is, if every bot should have their own
> configuration file. 

What would be the use case for this?
 #552 packaging does not mandate this, if general default
 values are in the source code of bots. (It would mandate it,
 if bots had to come with an example config file to be useful.)

Again one aspect to look for can be what we want to do with the
configuration files. One use case is:
We want to check the whole configuration for consistency. 
For this it make sense that a lot of stuff is known about
configuration parameters and to me the best way to specify this is
as part of the source code of bots using Python code and type information.
This way even more complex requirements for config values can be expressed 
using python functions and dynamic consistency check could use this code.
Thus the code for a bot specific configuration parameters should be 
close to the bot itself.
(And if their are parameters they share, it can be in the super class or 
abstract class, coming with IntelMQ (core).)

Okay, #552 would want a deinstallation method, which can be implemented
against a joined configuration storage as well.

> Some users wish to be able to start a bot 
> without having to rely on IntelMQ, 

Why? How can a bot with access to the IntelMQ queues be useful?
I can imagine some janitor functionality, like freshing an external
datasource format from time to time and this needs parameters
that the real bot also needs. Anyhow could be seen as not being the bot 
itself, it would just be shared config values.

If parsing of the central intelmq storage would be in a library,
then those assistent module could just read the config without
starting or stopping other parts of IntelMQ.

> If we want to support the request to be able to pass individual
> configurations to bots,

Why would I run a bot that affects the IntelMQ network
to be run with different parameters? I have to make sure to stop the bot with 
the real parameters.

> This individual configuration file would also allow a 
> bot to be run in a docker environment without having to set any
> environment variables. 

The bots would still have to access the commonly set parameters.
Interlude:
  https://12factor.net/config
believes that using ENVIRONMENT variables would be a good pattern
for running application parts ("apps") in different containers.
Wireing that happens outside of course.
The idea is, if you need a different set of configuration,
just fire up a container with it.
(I am not necessarily convinced of this pattern, leading to this comment
https://github.com/Intevation/intelmq-fody-backend/blob/ad7a88022bdeadf3461ab63ba8b6327013ec8772/tickets_api/tickets_api/serve.py#L90
)

> This would make configuration handling 
> probably easier, because then configuration settings could be stored
> in a file (and managed by a configuration management system) 

Several central configuration files could also be handled in an SCM.
Of course, the diff for a single bot cannot be seen more easily,
if it is just one file that is read.

> Proposal:
>
> * IntelMQ gets one global configuration file for all the bots and
>    the pipeline.conf will be removed

(Then it must have the default.conf possibilities.)

> * Every bot handles 0 to n `-c /path/to/configurationfile.$ext`
>    flags, which are treated the same way as the global configuration
>    file.

A complication I'd only do with a relevant use case.

> * Every bot also consults the environment and the values that are
>    set their overwrite the values in any configuration file

Same here.

>
> * There are also configuration files which list settings that are
>    not bot specific, i.e. via a reserved key default (successor of
>    the defaults.conf file) or group:id, those are also handled like
>    other configuration files, but the bot does not compare its name to
>    the key of the configuration.

So additional default.conf files? (I guess I do not fully understand the 
idea.)

> All the evaluated configuration formats provide the possibility to
> arrange the configuration parameters in hierarchies. To make the
> configuration files more readable

This seems part of the format discussion mostly.
(A file per bot, saves one level in the file, making a single file easier to 
read.)

> In an ideal setup, the bot should be totally
> indifferent as to if it runs in a Docker container, on bare metal,
> in a SystemD unit file or with SupervisorD. 

I agree in principle.
A potential solution is: the process manager could extract
all the configuration settings and export them all in environment variables.
This way the central configuration files (which were existing in all proposed 
variants) do not have to be shipped to the container, so filesystem access 
would not be mandatory, only access to redis and whatever other resources a 
bot needs.

Thinking about this, we could make a redis configuration / control queue
and then bots would only need to connect to the queue system and then request
their current configuration from there. (File that idea in folder *crazy*, it 
is getting close to end of business here. ;) )

Overall I've observed much good thinking while reading the storage part of the 
proposal part. The whole problem space does not really segments itself nicely 
in my head up to now, which is a sign that things are more involved than at 
first sight. Hope my mixture of questions and thoughts helps to make it 
better!

Best Regards,
Bernhard

-- 
www.intevation.de/~bernhard   +49 541 33 508 3-3
Intevation GmbH, Osnabrück, DE; Amtsgericht Osnabrück, HRB 18998
Geschäftsführer Frank Koormann, Bernhard Reiter, Dr. Jan-Oliver Wagner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.cert.at/pipermail/intelmq-dev/attachments/20210122/ac0c5fcf/attachment.sig>