[Intelmq-dev] Reports larger than 500 MB in IntelMQ

L. Aaron Kaplan kaplan at cert.at
Fri Jul 29 13:39:25 CEST 2016


> On 29 Jul 2016, at 08:45, Dustin Demuth <dustin.demuth at intevation.de> wrote:
> 
> 
> Dear All,
> 
> seems we have a solution for this problem now.
> 
> Bernhard has created a solution to split large csv-reports into chunks [1].
> 
> To do so, the collectors (in this case the "Mail-URL-Collector" which is the
> only one affected for our use case) is extended with  `generate_reports()`
> from `intelmq.lib.splitreports`.
> 
> The collector can be extended with two parameters. Those are `chunk_size`,
> determining the size of each chunk (I don't know the unit yet, seems to be
> bytes), and `chunk_replicate_header` which replicates the first line of the
> file.
> 
> 
> From my short look at the code, I see that splitreports cannot process lines
> which are comments (you might have seen those starting with a # sign).
> 
> Should this be integrated?
> 
sounds like a very good idea.

Please send a PR. Also - concerning the implementation: did you check if the python csv module does not already supply functions for that?

I skimmed the source code and it looks reasonable upon first inspection.
I'd prefer to see the re-use of more standard libs (whenever possible).

Best,
a.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.cert.at/pipermail/intelmq-dev/attachments/20160729/13f9d049/attachment.sig>


More information about the Intelmq-dev mailing list