Hi,
Am Donnerstag 10 Dezember 2020 13:17:45 schrieb Birger Schacht:
This part is about the question where do we store the configuration?.
overall I do miss the use cases or problems that should be addressed by the proposed changes. Having a problem description and links to discussion that have already taken place, would make it easier to comment on the proposal.
Some relevant places that describe wishes, status and suggestions: https://intelmq.readthedocs.io/en/latest/user/bots.html#common-parameters https://intelmq.readthedocs.io/en/latest/user/configuration-management.html https://github.com/certtools/intelmq/issues/267 (Configurations - Hierarchy configurations) closed https://github.com/certtools/intelmq/issues/552 (Enable separate packaging of bots by allowing addition and removals to the config)
The ideas document[^8] on GitHub already proposes to remove the pipeline.conf and specifying the destination pipelines in the individual bot configuration part. The declaration of the source queue can be dropped then as well, as it follows a rule anyway.
The idea sounds useful, to decrease size of the configuration. (Making something easier to understand is always a use case.)
In addition to that, to make the setup of IntelMQ easier, the defaults.conf should be dropped. Default values should be set in the Bot classes respectively in the IntelMQ process managers, but there is no need for a separate file.
The default.conf seems to be used to offer a single place to change options shared by many bots (e.g. http_user_agent) at once. If options exist where a common value for a single installation and their bots is useful the functionality has to be kept somewhere central.
I understood the new plave for this would be in a global configuration file, which contains what default.conf had. This would just be a renaming if there weren't other things in the file.
The old pipeline.conf has the wireing, which has a effect which goes beyond one bot. As it connects bots, it maybe interessing to have in one place to check for consitency.
Another question is, if every bot should have their own configuration file.
What would be the use case for this? #552 packaging does not mandate this, if general default values are in the source code of bots. (It would mandate it, if bots had to come with an example config file to be useful.)
Again one aspect to look for can be what we want to do with the configuration files. One use case is: We want to check the whole configuration for consistency. For this it make sense that a lot of stuff is known about configuration parameters and to me the best way to specify this is as part of the source code of bots using Python code and type information. This way even more complex requirements for config values can be expressed using python functions and dynamic consistency check could use this code. Thus the code for a bot specific configuration parameters should be close to the bot itself. (And if their are parameters they share, it can be in the super class or abstract class, coming with IntelMQ (core).)
Okay, #552 would want a deinstallation method, which can be implemented against a joined configuration storage as well.
Some users wish to be able to start a bot without having to rely on IntelMQ,
Why? How can a bot with access to the IntelMQ queues be useful? I can imagine some janitor functionality, like freshing an external datasource format from time to time and this needs parameters that the real bot also needs. Anyhow could be seen as not being the bot itself, it would just be shared config values.
If parsing of the central intelmq storage would be in a library, then those assistent module could just read the config without starting or stopping other parts of IntelMQ.
If we want to support the request to be able to pass individual configurations to bots,
Why would I run a bot that affects the IntelMQ network to be run with different parameters? I have to make sure to stop the bot with the real parameters.
This individual configuration file would also allow a bot to be run in a docker environment without having to set any environment variables.
The bots would still have to access the commonly set parameters. Interlude: https://12factor.net/config believes that using ENVIRONMENT variables would be a good pattern for running application parts ("apps") in different containers. Wireing that happens outside of course. The idea is, if you need a different set of configuration, just fire up a container with it. (I am not necessarily convinced of this pattern, leading to this comment https://github.com/Intevation/intelmq-fody-backend/blob/ad7a88022bdeadf3461a... )
This would make configuration handling probably easier, because then configuration settings could be stored in a file (and managed by a configuration management system)
Several central configuration files could also be handled in an SCM. Of course, the diff for a single bot cannot be seen more easily, if it is just one file that is read.
Proposal:
- IntelMQ gets one global configuration file for all the bots and
the pipeline.conf will be removed
(Then it must have the default.conf possibilities.)
- Every bot handles 0 to n `-c /path/to/configurationfile.$ext`
flags, which are treated the same way as the global configuration file.
A complication I'd only do with a relevant use case.
- Every bot also consults the environment and the values that are
set their overwrite the values in any configuration file
Same here.
- There are also configuration files which list settings that are
not bot specific, i.e. via a reserved key default (successor of the defaults.conf file) or group:id, those are also handled like other configuration files, but the bot does not compare its name to the key of the configuration.
So additional default.conf files? (I guess I do not fully understand the idea.)
All the evaluated configuration formats provide the possibility to arrange the configuration parameters in hierarchies. To make the configuration files more readable
This seems part of the format discussion mostly. (A file per bot, saves one level in the file, making a single file easier to read.)
In an ideal setup, the bot should be totally indifferent as to if it runs in a Docker container, on bare metal, in a SystemD unit file or with SupervisorD.
I agree in principle. A potential solution is: the process manager could extract all the configuration settings and export them all in environment variables. This way the central configuration files (which were existing in all proposed variants) do not have to be shipped to the container, so filesystem access would not be mandatory, only access to redis and whatever other resources a bot needs.
Thinking about this, we could make a redis configuration / control queue and then bots would only need to connect to the queue system and then request their current configuration from there. (File that idea in folder *crazy*, it is getting close to end of business here. ;) )
Overall I've observed much good thinking while reading the storage part of the proposal part. The whole problem space does not really segments itself nicely in my head up to now, which is a sign that things are more involved than at first sight. Hope my mixture of questions and thoughts helps to make it better!
Best Regards, Bernhard