[Intelmq-dev] Reports larger than 500 MB in IntelMQ

L. Aaron Kaplan kaplan at cert.at
Fri Jul 22 23:49:01 CEST 2016


> On 22 Jul 2016, at 22:47, Otmar Lendl <lendl at cert.at> wrote:
> 
> On 22.07.2016 09:02, Dustin Demuth wrote:
>> Dear all,
>> 
>> currently we are facing the problem, that IntelMQ is not capable of handling
>> large reports ( > 500 MB) when using Redis as a Message Queuing System.
>> First we thought this might get fixed in most recent redis versions (see:
>> [1]), but apparently this is not the case.
> 
> Dustin,
> 
> 500 MB is quite a significant amount of data and IMHO something where
> passing around such chunks in a RAM-based system is no longer sensible.
> 
> This does not only apply to IntelMQ: I've seen a number of code-bases
> fall over when they were confronted with large data-sets.
> 
> (Once a student wrote a tool for us to detect CMS versions based on a
> list of domains. All nice and fine for his test-data, but once we put
> the 1M .at domains in, it broke down. The "load everything into the RAM"
> approach has limits.)
> 
> IMHO:
> 
> I guess you hit this limit on some shadowserver feeds.
> 
> A sensible approach is to add some sort of "split" option to the
> collector bots. Yeah, this is not nice and perfect, but I'd add some
> logic like
> 
> if ( sizeof(collected data) > limit) {
>  grab header-line
>  foreach chunk of X data-lines {
> 	push headerline, chunk of data-lines into REDIS queue
>  }
> } else {
>  push original_data into REDIS queue
> }
> 
> to the collector bots. And (and I haven't checked this), make sure you
> never try to hold everything in memory. Download to temp-files and
> stream-process those.
> 
> If we're dealing with anything csv-like, the only additional info that
> piece of code needs is whether
> 
> * some extra comment line needs to be stripped
> * if there is a header line


+1
Totally agree... I think this is more sensible.

A.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.cert.at/pipermail/intelmq-dev/attachments/20160722/449effd7/attachment.sig>


More information about the Intelmq-dev mailing list