On 28.11.2023, at 10:42, Homma, L.J. (Luitzen) l.j.homma@minezk.nl wrote:
Thank you for all replies :) Thank you, Aron, for pointing to the stats portal. I agree with your note. Based on all replies, it's interesting to hear how others are trying to solve the stats/monitoring questions. It's also a good thing to address that there is a difference between stats and monitoring. However, I think there is also quite some overlap because you probably want to set up monitoring/alerting based on your stats. Shiny management reports are indeed a different thing :)
Actually, that's a really good point I had not thought about. If your stats on incidents which go through intelmq go through the roof at some point (per classification.identifier or classificiat.type, etc.) then you might want to alert. Good point!
Monitoring and dashboarding itself probably should not be part of the IntelMQ core.
Totally agree. If we keep the focus of intelmq on the hard part of data collection, filtering, enriching and ingestion into some DB, some ELK or Splunk or whatever, then we already solve a hard problem. If we diversify too much, we risk doing everything but nothing really well.
However, I think it would be very useful to share monitoring/alerting options in the documentation or share code (bots) to feed your stats/reporting/dashboarding solution. Everyone is free to make its own configuration setup that fits its needs but could use a default bot to feed his reporting engine.
True, like HOWTOs or examples?
Our goal for stats in more detail:
- To get technical insights into what is coming into the pipeline. So, event counts per bot, but also more granulated numbers for each report/feed.
- Get insights on how many events are deduped.
- We have a lot of filter bots in our environment. We want stats on the event filter ins/outs.
- We have multiple output bots where we also want stats.
- In the end, you want a kind of CRC paper trail check to ensure your line is in control and not "losing" any data on the way in the journey through the data pipeline(s) :)
@Aron, we will use your repo for inspiration and will probably develop our stats needs further in 2024. If it's relevant to the community, we'll see if we can share it.
That would be lovely. Note that mine is already a bit outdated, but - yes, as you said, it might give you inspirations.
Best, Aaron.
Regards,
Luitzen Homma
-----Oorspronkelijk bericht----- Van: IntelMQ-dev intelmq-dev-bounces@lists.cert.at Namens L. Aaron Kaplan Verzonden: dinsdag 28 november 2023 10:31 Aan: Bernhard Reiter bernhard@intevation.de CC: intelmq-dev@lists.cert.at Onderwerp: Re: [IntelMQ-dev] timescale db is non-free (was: Stats in Intelmq?)
Good idea.
But please note that timescaleDB is
- optional (plain postgresql with the proper indices usually does the job just as well)
- actually timescaledb is just a few extensions on top of plain postgresql.
So, I don't really see a lock-in here.
But a notice would be great. Maybe as part of any PR which includes more timescaledb support...
My 2 cents of experience with working with timescaledb.
PS: back then at CERT.at we used timescaledb initially for https://github.com/certtools/stats-portal already and it is just great for its purpose: to do very fast SELECT ... GROUP BY .. WHERE timestamp in INTERVAL(...) queries. That's where it shines (and coincidentally that's often what you need for data-cubes/statistics on large DBs). But you can achieve similar functionality also with plain postgresql or other databases. I see that as a functionality which may be replaced by other similar systems - if needed (and that depends only on the size of your eventDB).
On 28.11.2023, at 10:15, Bernhard Reiter bernhard@intevation.de wrote:
Signed PGP part Hello,
Am Montag 27 November 2023 12:23:56 schrieb Kamil Mankowski via IntelMQ-dev:
However, I can say what is currently available, and how I use/plan to use it:
- for final events, we use - as mentioned by Aaron - a database. We
are going to use Timescale DB,
note that the more recent features of timescale db are non-free (aka proprietary) software.
https://docs.timescale.com/about/latest/timescaledb-editions/ "Many of the most recent features of TimescaleDB are only available in TimescaleDB Community Edition."
"You cannot sell TimescaleDB Community Edition as a service"
"You can modify the TimescaleDB Community Edition source code and run it for production use."
Maybe adding a proper warning against the lock-in in the document would match IntelMQ's idea to stay open.
Regards Bernhard
-- https://intevation.de/~bernhard +49 541 33 508 3-3 Intevation GmbH, Osnabrück, DE; Amtsgericht Osnabrück, HRB 18998 Geschäftsführer: Frank Koormann, Bernhard Reiter
Dit bericht kan informatie bevatten die niet voor u is bestemd. Indien u niet de geadresseerde bent of dit bericht abusievelijk aan u is gezonden, wordt u verzocht dat aan de afzender te melden en het bericht te verwijderen. De Staat aanvaardt geen aansprakelijkheid voor schade, van welke aard ook, die verband houdt met risico's verbonden aan het elektronisch verzenden van berichten.
This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. The State accepts no liability for damage of any kind resulting from the risks inherent in the electronic transmission of messages.