Dear IntelMQ-Devs,
whilst analysing our current setup and possible requirements, we discovered that an aggregation of events within IntelMQ might be a reasonable thing to do.
In [1] I sketched how this might work. I'd be glad if you could find the time to comment on this approach and participate in the discussion.
[1] https://github.com/certtools/intelmq/issues/751
Best regards
Dustin
On 21 Oct 2016, at 15:06, Dustin Demuth dustin.demuth@intevation.de wrote:
Dear IntelMQ-Devs,
whilst analysing our current setup and possible requirements, we discovered that an aggregation of events within IntelMQ might be a reasonable thing to do.
I am not sure if an aggregation *within* intelmq makes sense. The classical way would be to do an aggregation from a datastore/DB after intelmq puts it there.
We risk feature creep if we do that in intelmq!
I am involved with another project [1] where we explicitly deal with large amounts of data. We intentionally decided against the aggregation within the ETL part (extract transform load) - the equivalent of intelmq. There we process ~ 1 TB of data.
I *highly* recommend to take a serious look at other ETL and aggregation tools and processes and then come back to this discussion. Intelmq was not made for aggregation. Please let's keep these things separated or at least not in the core part of intelmq. If aggregation makes sense for you within intelmq, no one is going to stop you. But I don't want to see that feature in the core part. Because it's a different tool.
My 2 cents, a.
Am Freitag 21 Oktober 2016 18:31:10 schrieb L. Aaron Kaplan:
an aggregation of events within IntelMQ might be a reasonable thing to do.
I am not sure if an aggregation *within* intelmq makes sense. The classical way would be to do an aggregation from a datastore/DB after intelmq puts it there.
The aggregation for email notifications in abuse handling is special. We are seeing this while building the solution for CERT-Bund.
It is not the need of data collection for analysis, but just sending out one email. So the time-frame is short.
I *highly* recommend to take a serious look at other ETL and aggregation tools and processes and then come back to this discussion. Intelmq was not made for aggregation.
In a data flow sense, the deduplicator already "aggregates". Some abuse handling decisions will depend on seeing several sources report something in the future, for this they will need to wait a bit, maybe just a few minutes like the deduplicator.
The main question is: How many typical intelmq setups will want to have functionality that sends out an email? If many are, than email should be part of the core intelmq experience. And email means aggregated at least for a few minutes, otherwise it is too much overhead.
Best Regards, Bernhard