On 29 Jul 2016, at 08:45, Dustin Demuth dustin.demuth@intevation.de wrote:
Dear All,
seems we have a solution for this problem now.
Bernhard has created a solution to split large csv-reports into chunks [1].
To do so, the collectors (in this case the "Mail-URL-Collector" which is the only one affected for our use case) is extended with `generate_reports()` from `intelmq.lib.splitreports`.
The collector can be extended with two parameters. Those are `chunk_size`, determining the size of each chunk (I don't know the unit yet, seems to be bytes), and `chunk_replicate_header` which replicates the first line of the file.
From my short look at the code, I see that splitreports cannot process lines which are comments (you might have seen those starting with a # sign).
Should this be integrated?
sounds like a very good idea.
Please send a PR. Also - concerning the implementation: did you check if the python csv module does not already supply functions for that?
I skimmed the source code and it looks reasonable upon first inspection. I'd prefer to see the re-use of more standard libs (whenever possible).
Best, a.