Hi,
We need some consistent behavior for extracting files of downloaded archives. For this, I'd like to hear some opinions from users. What do you want to be able to configure, what should be done automatically? Do you want it to be automatic and still have the possibility to override? There are some possible settings: * Do extraction at all * What to extract? Some files vs everything. Can be combined with above * archive type. Could be guessed from filename extension or mimetype. The latter is as not trivial in python as I expected :/
Background: The HTTP collector can currently extract files from zip-files on the fly. There is no parameter for this, all files will be passed on as separate reports. The RT collector can extract zip on the fly if the parameter `unzip_attachment` is true. PR#1095[0] adds the ability to extract files for tar.gz archives including a parameter `extract_files` to give a list of filenames to be extracted. And all files will be extracted if the parameter is simply True.
Sebastian
[0]: https://github.com/certtools/intelmq/pull/1095