[Intelmq-dev] Reports larger than 500 MB in IntelMQ

Otmar Lendl lendl at cert.at
Fri Jul 22 22:47:05 CEST 2016


On 22.07.2016 09:02, Dustin Demuth wrote:
> Dear all,
> 
> currently we are facing the problem, that IntelMQ is not capable of handling 
> large reports ( > 500 MB) when using Redis as a Message Queuing System.
> First we thought this might get fixed in most recent redis versions (see: 
> [1]), but apparently this is not the case.

Dustin,

500 MB is quite a significant amount of data and IMHO something where
passing around such chunks in a RAM-based system is no longer sensible.

This does not only apply to IntelMQ: I've seen a number of code-bases
fall over when they were confronted with large data-sets.

(Once a student wrote a tool for us to detect CMS versions based on a
list of domains. All nice and fine for his test-data, but once we put
the 1M .at domains in, it broke down. The "load everything into the RAM"
approach has limits.)

IMHO:

I guess you hit this limit on some shadowserver feeds.

A sensible approach is to add some sort of "split" option to the
collector bots. Yeah, this is not nice and perfect, but I'd add some
logic like

if ( sizeof(collected data) > limit) {
  grab header-line
  foreach chunk of X data-lines {
	push headerline, chunk of data-lines into REDIS queue
  }
} else {
  push original_data into REDIS queue
}

to the collector bots. And (and I haven't checked this), make sure you
never try to hold everything in memory. Download to temp-files and
stream-process those.

If we're dealing with anything csv-like, the only additional info that
piece of code needs is whether

* some extra comment line needs to be stripped
* if there is a header line

otmar
-- 
// Otmar Lendl <lendl at cert.at> - T: +43 1 5056416 711
// CERT Austria - http://www.cert.at/
// Eine Initiative der nic.at GmbH - http://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.cert.at/pipermail/intelmq-dev/attachments/20160722/442f07f8/attachment.sig>


More information about the Intelmq-dev mailing list