[Intelmq-users] Automatic extraction for archives

Sebastian Wagner wagner at cert.at
Mon Oct 2 17:29:36 CEST 2017


Hi,

We need some consistent behavior for extracting files of downloaded
archives. For this, I'd like to hear some opinions from users. What do
you want to be able to configure, what should be done automatically? Do
you want it to be automatic and still have the possibility to override?
There are some possible settings:
* Do extraction at all
* What to extract? Some files vs everything. Can be combined with above
* archive type. Could be guessed from filename extension or mimetype.
The latter is as not trivial in python as I expected :/

Background:
The HTTP collector can currently extract files from zip-files on the
fly. There is no parameter for this, all files will be passed on as
separate reports.
The RT collector can extract zip on the fly if the parameter
`unzip_attachment` is true.
PR#1095[0] adds the ability to extract files for tar.gz archives
including a parameter `extract_files` to give a list of filenames to be
extracted. And all files will be extracted if the parameter is simply True.

Sebastian

[0]: https://github.com/certtools/intelmq/pull/1095

-- 
// Sebastian Wagner <wagner at cert.at> - T: +43 1 5056416 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.cert.at/pipermail/intelmq-users/attachments/20171002/df6ab8a5/attachment.sig>


More information about the Intelmq-users mailing list