[IntelMQ-dev] storage of config (Re: [Intelmq-dev] IEP01: IntelMQ Configuration Handling)

Tue Jan 26 20:04:22 CET 2021

Hi.

On 1/22/21 5:26 PM, Bernhard Reiter wrote:
>> This part is about the question where do we store the
>> configuration?.
> overall I do miss the use cases or problems 
> that should be addressed by the proposed changes.
> Having a problem description and links to discussion that have already taken 
> place, would make it easier to comment on the proposal.
>
> Some relevant places that describe wishes, status and suggestions:
>   https://intelmq.readthedocs.io/en/latest/user/bots.html#common-parameters
>   https://intelmq.readthedocs.io/en/latest/user/configuration-management.html
>   https://github.com/certtools/intelmq/issues/267
>     (Configurations - Hierarchy configurations) closed
>   https://github.com/certtools/intelmq/issues/552
>     (Enable separate packaging of bots by allowing addition and removals to 
> the config)
Plus
https://github.com/certtools/intelmq/issues/570 "configuration format"
https://github.com/certtools/intelmq/issues/121 "Configuration Files"
(closed but not implemented all ideas)
https://github.com/certtools/intelmq/issues/1026 "Proposal: use template
library for JSON configs" (not addressed by this proposal)
https://github.com/certtools/intelmq/issues/1580 "Some parameters with
default values throw AttributeError when not set"
and related to the BOTS file:
https://github.com/certtools/intelmq/issues/440 "Installing custom Bots"
https://github.com/certtools/intelmq/issues/1646 "Run custom bot"
https://github.com/certtools/intelmq/issues/552 "Enable separate
packaging of bots by allowing addition and removals to the config."
https://github.com/certtools/intelmq/issues/757 "Clearly define all
parameters used in a bot"
https://github.com/certtools/intelmq/issues/668 "Very long BOTS file"
https://github.com/certtools/intelmq/issues/644 "Errors when already
configured bots gain additional options through upgrade"
https://github.com/certtools/intelmq/issues/908 "Parameter from BOTS
does'nt passed to a new bot"

But non of them directly matches the proposal and most are addressed by
the "Internal handling" section of the proposal. Our proposal is also
based on the requirements collection last year and extended to match the
behavior of other tools (`-c` parameter) or simply some handy usability
tricks like setting parameters with `-p` (useful for debugging &
testing). So, besides the examples given or linked in the proposal
itself, there are not much more use-cases.

Our intention was as well to *start* a discussion by the proposal in the
first place, but until now the discussion mainly focused on one aspect.
One lesson learning on this is to split proposals into smaller parts,
and not group them too much.

>> In addition to that, to make the setup of IntelMQ easier, the
>> defaults.conf should be dropped. Default values should be set in the
>> Bot classes respectively in the IntelMQ process managers, but there
>> is no need for a separate file.
> The default.conf seems to be used to offer a single place to change
> options shared by many bots (e.g. http_user_agent) at once.
> If options exist where a common value for a single installation
> and their bots is useful the functionality has to be kept somewhere
> central. 
>
> I understood the new plave for this would be in a global configuration file,
> which contains what default.conf had. This would just be a renaming if there 
> weren't other things in the file.
It's more than renaming, it's also a cleanup. As the IntelMQ-default
values go into the code, that file (or section in a file) only needs to
carry those default values which are set by the administrator and differ
from IntelMQ's defaults. So the default-files of most installations can
be either dropped or will shrink significantly.
>> Another question is, if every bot should have their own
>> configuration file. 
> What would be the use case for this?
>  #552 packaging does not mandate this, if general default
>  values are in the source code of bots. (It would mandate it,
>  if bots had to come with an example config file to be useful.)

The question/proposal is based on a use-case identified by the
requirements collection:

https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architecture-3.0.md#user-content-configuration-files

> be on a per-program-basis (one config file per "bot"). The config
files per program shall reside in $base/etc/config.d/ and follow the
common linux standards.

The proposal to use the -c parameter for this covers the use-case, but
is more generic. For example it can be handy for Docker-setups as well,
as described in the initial mail.

> Again one aspect to look for can be what we want to do with the
> configuration files. One use case is:
> We want to check the whole configuration for consistency. 
> For this it make sense that a lot of stuff is known about
> configuration parameters and to me the best way to specify this is
> as part of the source code of bots using Python code and type information.
> This way even more complex requirements for config values can be expressed 
> using python functions and dynamic consistency check could use this code.
> Thus the code for a bot specific configuration parameters should be 
> close to the bot itself.
Definitely. We thought about using variable typing for this, but haven't
done PoCs yet. See section "Internal handling" of the proposal
> (And if their are parameters they share, it can be in the super class or 
> abstract class, coming with IntelMQ (core).)
For the CollectorBot and ParserBot classes, this is already the case.
There's more potential, e.g. a HTTPBot class.
>> Some users wish to be able to start a bot 
>> without having to rely on IntelMQ, 
> Why? How can a bot with access to the IntelMQ queues be useful?
> I can imagine some janitor functionality, like freshing an external
> datasource format from time to time and this needs parameters
> that the real bot also needs. Anyhow could be seen as not being the bot 
> itself, it would just be shared config values.
I don't have more details on this use-case. But this use-case is covered
by the more generic idea to have a -c parameter to load configuration files.
>> If we want to support the request to be able to pass individual
>> configurations to bots,
> Why would I run a bot that affects the IntelMQ network
> to be run with different parameters? I have to make sure to stop the bot with 
> the real parameters.
When running bots interactively for testing and debugging, this would be
very handy. It's the operators responsibility to stop the bot, after
starting it with deviating parameters.
>> This individual configuration file would also allow a 
>> bot to be run in a docker environment without having to set any
>> environment variables. 
> The bots would still have to access the commonly set parameters.
Not if the commonly set parameters are included in that file, or if
IntelMQ's defaults are ok.
> Interlude:
>   https://12factor.net/config
> believes that using ENVIRONMENT variables would be a good pattern
> for running application parts ("apps") in different containers.
> Wireing that happens outside of course.
> The idea is, if you need a different set of configuration,
> just fire up a container with it.
> (I am not necessarily convinced of this pattern, leading to this comment
> https://github.com/Intevation/intelmq-fody-backend/blob/ad7a88022bdeadf3461ab63ba8b6327013ec8772/tickets_api/tickets_api/serve.py#L90
> )
This is also the best practice for Docker, leading to this part of the
proposal:
>> * Every bot also consults the environment and the values that are
>>    set their overwrite the values in any configuration file
> Same here.
The primary use-case here is Docker. In Docker the best-practice to pass
configuration variables to containers are environment variables. This
approach is partly used by the existing Docker image we created.
For now, we only implemented this for redis_cache_host
(https://github.com/certtools/intelmq/blob/develop/intelmq/lib/bot.py#L734-L738)
as bare minimum to be able to create the Docker image.
>> * There are also configuration files which list settings that are
>>    not bot specific, i.e. via a reserved key default (successor of
>>    the defaults.conf file) or group:id, those are also handled like
>>    other configuration files, but the bot does not compare its name to
>>    the key of the configuration.
> So additional default.conf files? (I guess I do not fully understand the 
> idea.)

In order to get rid of the separate defaults.conf file, the proposal
lists two solutions:

* the reserved key "default" (or similar). For example, the
configuration file could look like this:
```
- shodan1:
    module: intelmq.bots.collectors.shodan.collector
- mylittlebot23:
    module: intelmq.bots.expert.asn_lookup.expert
    http:
      proxy: http://myproxy.tld:80
- default:
  http:
    proxy: http://mydefault.proxy.intern:8080
```

* The other *additional* solution are the group defaults. The example
given in the proposal is:
```
- group:collectors
  http:
    proxy: http://thirdparty.proxy.tld:9000
```

This would be a new feature and can be handy for e.g. rate_limit or
error handling parameters

>> In an ideal setup, the bot should be totally
>> indifferent as to if it runs in a Docker container, on bare metal,
>> in a SystemD unit file or with SupervisorD. 
> I agree in principle.
> A potential solution is: the process manager could extract
> all the configuration settings and export them all in environment variables.
> This way the central configuration files (which were existing in all proposed 
> variants) do not have to be shipped to the container, so filesystem access 
> would not be mandatory, only access to redis and whatever other resources a 
> bot needs.
That's actually one of the possibilities for deploying every bot in a
single docker container and pass the parameters to the containers by the
central orchestration component. However, this can be address later.
> Thinking about this, we could make a redis configuration / control queue
> and then bots would only need to connect to the queue system and then request
> their current configuration from there. (File that idea in folder *crazy*, it 
> is getting close to end of business here. ;) )
I wouldn't call it crazy, but radical.
> Overall I've observed much good thinking while reading the storage part of the 
> proposal part. The whole problem space does not really segments itself nicely 
> in my head up to now, which is a sign that things are more involved than at 
> first sight. Hope my mixture of questions and thoughts helps to make it 
> better!

Thank you for all your valuable feedback, insights and thoughts. We are
very thankful for your detailed responses!

best regards
Sebastian

-- 
// Sebastian Wagner <wagner at cert.at> - T: +43 1 5056416 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cert.at/pipermail/intelmq-dev/attachments/20210126/4bbd5bc9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.cert.at/pipermail/intelmq-dev/attachments/20210126/4bbd5bc9/attachment-0001.sig>