Dear IntelMQ Developers and Users,
nowadays security incidents are more important than 10 years ago. As
IntelMQ can be used as core element for automated security incident
handling, we would like to provide a way to share information with other
intelmq instances. This proposal is also an alternative to IEP03 insofar
as solving the "multiple values" is possible by using UUIDs so "link"
related events in a backwards-compatible manner.
If you're interested, please let us know, so we could …
[View More]organize a
hackathon for further discussions about the specification of the
meta-information.
Previously this idea was discussed in [0] and [1].
[0]
https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectu…
[1] https://github.com/certtools/intelmq/issues/1521
# IEP04: Internal Data Format: Meta Information and Data Exchange
To ease data exchange between two or more IntelMQ instances, adding some
meta-information to the events can make this sharing easier in certain
regards.
"Linking" events could be based on the same theory as `git` using it -
with parent hashes ( we would call it UUID ).
### TL;DR
Communication between one or more IntelMQ instances & exchange data with
a backwards-compatible format. P2P or centralized architecture is a big
topic, which has to be discussed after the format is being set.
### Why is metadata important?
Short and simple. To avoid race conditions & being able to discard/drop
already processed events from other instances.
### Meta information
Metadata is used to transfer some general data, which is not likely
related to the event itself. It's more or less just an information to
keep events clear & sortable.
A message could look like:
{
"meta": {
"version": 1, # protocol version, so we are allowed to fallback
to old versions too
"uuid": {
current: "cert_at:aaaa-bbbb-cccc-dddd" # format to be decided
parent: "cert_at:xxxx-yyyy-zzzz-ffff" # format to be
discussed, if not set -> current is the parent uuid
},
"type": "event",
"format": "intelmq", # i. e. this field could contain "n6" or
"idea", so the receiving component can decode on demand.
},
"payload": { # normal intelmq data
"source.ip": "127.0.0.1",
"source.fqdn": "example.com",
"raw": base64-blob
}
}
Tell us your opinion about adding non-standardized meta-information
fields ( i. e. RTIR ticket number, origin, other local contact
informationen ... and so on )
#### The UUID
For the UUID there are multiple options:
1. Generate a random 128 bit UUID
2. A list of entities, which dealt with this event already. For example
if an event was passed on from cert-at to cert-ee, the field could look
like `!cert-at!cert-ee`. A message sending loop can be detected if the
own name is already in this field upon reception.
3. Using CyCat: `publisher-short-name:project-short-name:UUID`. For
example: `cert-at:intelmq:72ddb00c-2d0a-4eea-b7ac-ae122b8e6c3b`, or
`cert-pl:n6:f60c9fb9-81f9-4e0b-8a44-ea41326a15b3`. Some more research
and discussion is required before the implementation of this option.
Have a look at https://www.cycat.org/services/concept/ for more details.
4. A hash: A benefit using a hash is that we're able to recalculate them
on every intelmq instance.
### Exporting events to other systems
In IntelMQ 2.x the events only comprise of the "payload" and no meta
information. For local storages like file output or databases, the meta
information may not be relevant in some use-cases. So it needs to be
possible to export events *without* meta information, which is also the
backwards-compatible behaviour.
The "type" field exists in the current format as "__type" in the flat
payload structure. In the output bots there's currently a boolean
parameter `message_with_type` to include the field `__type` in the "export".
For optionally exporting meta-information like uuid or format, a similar
logic could be used.
### How can data exchange work?
This now depends on how IntelMQ instances can communicate, either
Peer-to-peer or via a central data hub. Both of them do have pro's and
con's.
#### P2P ( Peer 2 Peer )
Decentralized network
+ Less downtimes: A downtime of one instance, does not affect the whole
network
+ Better privacy: data is not shared to an unrelated instance
+ More secure: data can optionally be encrypted (key-exchange between
instances?)
+ Decentralized and local maintenance
~ Network latency depends on server locations
- Networking issues may occur
How would data exchange looks like between two instances:
1) Instance A has events which should be relayed to Instance B & C,
because they're not sure who the actually receiver should be
2) Instance A ensures all messages have a UUID
3) Instance A sends the data to Instance B & Instance C
4) Instance B checks the data & they're sure that the data should be for
Instance C
5) Instance C receives data from Instance A & Instance B
6) Instance C checks the UUID, which is the same & drops the package
from Instance B
#### (Central) Data hub
+ Less maintenance: Is maintained by the hub administrator
+ Central data storage (reports can optionally be cached to be
downloaded later)
~ Central data analysis (e.g. statistics) is possible
~ Network latency depends on server locations
- point of failure: if network problems occur, no exchange is possible
As already seen above, data exchange here would be less complicated. The
sending may look like:
1) Instance A has events which should be relayed to Instance B (e.g.
different country)
2) Instance A ensures all messages have a UUID
3) Instance A sends these messages to the data hub
The reception side can look like:
1) Instance B connects to central instance
2) Instance B queries and downloads all available messages
3) Upon reception, all messages are de-duplicated based on the UUID:
a) If the UUID is already known, discard the message
b) If the UUID has not been seen before, continue with processing
To sum up, both exchange variants are useful. More research is needed,
i. e. a mixed infrastructure with centralized parts but can be
decentralized too. However, this shall not be neither the purpose nor
the aim of this IEP.
--
// Sebastian Waldbauer <waldbauer(a)cert.at> - T: +43 1 5056416 7202
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg
[View Less]
Dear community,
April is nearing it's end and it's time to release a bunch of bugfixes.
Please find below the list of changes. Thanks to all contributors for
the issues reported and pull requests!
The new version is already available on GitHub, PyPI, the deb+rpm
repositories and DockerHub.
Installation documentation:
https://intelmq.readthedocs.io/en/maintenance/user/installation.html
Upgrade documentation:
https://intelmq.readthedocs.io/en/maintenance/user/upgrade.html
### Core
- `intelmq.…
[View More]lib.harmonization`:
- `TLP` type: accept value "yellow" for TLP level AMBER.
### Bots
#### Collectors
- `intelmq.bots.collectors.shadowserver.collector_reports_api`:
- Handle timeouts by logging the error and continuing to next report
(PR#1852 by Marius Karotkis and Sebastian Wagner, fixes #1823).
#### Parsers
- `intelmq.bots.parsers.shadowserver.config`:
- Parse and harmonize field `end_time` as date in Feeds
"Drone-Brute-Force" and "Amplification-DDoS-Victim" (PR#1833 by Mikk
Margus Möll).
- Add conversion function `convert_date_utc` which assumes UTC and
sanitizes the data to datetime (by Sebastian Wagner, fixes #1848).
- `intelmq.bots.parsers.shadowserver.parser_json`:
- Use the overwrite parameter for optionally overwriting the
"feed.name" field (by Sebastian Wagner).
- `intelmq.bots.parsers.microsoft.parser_ctip`:
- Handle fields `timestamp`, `timestamp_utc`, `source_ip`,
`source_port`, `destination_ip`, `destination_port`, `computer_name`,
`bot_id`, `asn`, `geo` in `Payload` of CTIP Azure format (PR#1841,
PR#1851 and PR#1879 by Sebastian Wagner).
- `intelmq.bots.parsers.shodan.parser`:
- Added support for unique keys and verified vulns (PR#1835 by Mikk
Margus Möll).
- `intelmq.bots.parsers.cymru.parser_cap_program`:
- Fix parsing in whitespace edge case in comments (PR#1870 by Alex
Kaplan, fixes #1862).
#### Experts
- `intelmq.bots.experts.modify`:
- Add a new rule to the example configuration to change the type of
malicious-code events to `c2server` if the malware name indicates c2
(PR#1854 by Sebastian Wagner).
- `intelmq.bots.experts.gethostbyname.expert`:
- Fix handling of parameter `gaierrors_to_ignore` with value `None`
(PR#1890 by Sebastian Wagner, fixes #1886).
#### Outputs
- `intelmq.bots.outputs.elasticsearch`: Fix log message on required
elasticsearch library message (by Sebastian Wagner).
### Documentation
- `dev/data-harmonization`: Fix taxonomy name "information gathering"
should be "information-gathering" (by Sebastian Wagner).
### Tests
- `intelmq.tests.bots.parsers.microsoft.test_parser_ctip_azure`:
- Add test case for TLP level "YELLOW".
### Known issues
- ParserBot: erroneous raw line recovery in error handling (#1850).
--
// Sebastian Wagner <wagner(a)cert.at> - T: +43 676 898 298 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg
[View Less]
Dear *,
after getting rid of the BOTS and the defaults.conf files (see [0]) we
used the last weeks on merging the pipeline.conf and changing the format
of the runtime file to YAML.
The pipeline configuration is now part of the individual bot
configuration in the runtime definition. In addition, there is no more
need to define a source queue for bots, by default the source queue is
called '{botid}-queue', but you can override the name by setting the
source_queue setting.
Another change, …
[View More]one regarding the definition of the destination queues,
is that we will drop support for defining those using a string or a list
of strings. Only named queues [1] are supported for destination_queues,
starting with IntelMQ 3.0.
The announced switch to using YAML for the format of the runtime
configuration is now also merged. We used this change to also change the
fileextension from .conf to .yaml, which means having syntax
highlighting without having to modify your editors configuration ;)
IntelMQ will rename the file for you if it finds a runtime.conf file and
it also updates the format to yaml once it writes the configuration
(i.e. during an upgrade step or if you for change a bot configuration
using intelmqctl).
We also started on updating the documentation to reflect the changes,
feel free to browse the latest version online [2].
cheers,
Birger
[0] https://lists.cert.at/pipermail/intelmq-dev/2021-March/000419.html
[1]
https://intelmq.readthedocs.io/en/latest/user/configuration-management.html…
[2]
https://intelmq.readthedocs.io/en/latest/user/configuration-management.html
--
// Birger Schacht <schacht(a)cert.at>
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg
[View Less]
Dear IntelMQ Developers and Users,
an evaluation of current challenges with the internal data format led to
the idea of allowing multiple values for one field in IntelMQ 3.0
(scheduled for June 2021)[0].
The idea is described below, including various advantages and
disadvantages. We appreciate your input, opinion and analysis of further
implications on this idea.
We plan to evaluate the feedback that emerged in two weeks.
[0]
https://github.com/certtools/intelmq/blob/version-3.0-ideas/…
[View More]docs/architectu…
"The new IDF shall support (sorted) lists of IPs, domains,
taxonomy categories, etc. By convention the most relevant item in such a
list MUST be the first item in the sorted list."
https://github.com/certtools/intelmq/blob/version-3.0-ideas/docs/architectu…:
n6-system
"Since the new IDF shall support multiple values, mapping to n6
should be rather easy."
## Use-cases
### Network information
IntelMQ's format currently allows for *exactly one* value per field. For
example, every event can have *one* `source.ip` and *one* `source.fqdn`.
In some use-cases, multiple values can be useful, for example when
querying DNS information. One domain (`source.fqdn`) can point to
multiple IP addresses (`source.ip`). The other way round, multiple
domains point to the same IP address is also very common. The use-case
first appeared was that one IP address can be part of multiple
Autonomous systems (`source.asn`).[1][2][3]
See the examples below in section Format.
[1]: "Multiple ASNs/networks per IP? #543"
https://github.com/certtools/intelmq/issues/543
[2]: "BOT: DNS lookup #373" https://github.com/certtools/intelmq/issues/373
[3]: "reverse DNS: Only first record is used
"https://github.com/certtools/intelmq/issues/877
### Classification
Another use-case is to use multiple classifications.[5] For example, if
a website was hacked and used for a phishing page, it can be assigned
two classifications:
For the hacking: Taxonomy: information-content-security, type:
unauthorised-modification-of-information
For the phishing page: Taxonomy: fraud, type: phishing
Another example are reachable networks services, which should not be
accessible by the internet. Shadowserver provides a lot of this data.
Open XDMCP instances are both DDoS amplifiers and Potentially unwanted
accessible systems. Therefore both classifications apply:
Taxonomy: vulnerable, type: ddos-amplifier
Taxonomy: vulnerable, type: potentially-unwanted-accessible-system
A list of all fields on the RSIT can be found in the RSIT repository[6]
[5]:
https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/…
[6]:
https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/…
## Format
Some examples:
{"source.ip": ["192.0.43.8"], "source.asn": [16876, 40528]}
{"source.ip": ["10.0.0.1", "10.0.0.2"], "source.url":
["http://example.com/", "http://example.net"]}
{"classification.taxonomy": ["information-content-security", "fraud"],
"classification.type": ["unauthorised-modification-of-information",
"phishing"], "source.url": ["http://example.com/"], "source.ip":
["10.0.0.1", "10.0.0.2"]}
In the bots' code multiple values need to be taken car of. For example,
instead of:
ip_addr = event["source.ip"]
# do stuff
it is necessary to loop over the values:
for ip_addr in event["source.ip"]:
# do stuff
This logic is required for *all* fields which can have multiple values,
therefore nested loops may be necessary.
Everything which processes IntelMQ data needs to be adapted, including
data bases. See the "Disadvantages" section below.
### Optional back-conversion ("value-explosion")
One variant/option of this IEP is to create a conversion layer from the
new multi-value format to the old one-value format by creating multiple
events with only one value per field. Using this conversion,
compatibility with external components can be kept, while the advantages
only exist inside the IntelMQ core (ie. the bots).
Examples:
{"source.ip": ["127.0.0.1", "127.0.0.2"], "source.fqdn": ["example.com"]}
-> {"source.ip": "127.0.0.1", "source.fqdn": ["example.com"]},
{"source.ip": "127.0.0.2", "source.fqdn": ["example.com"]}
{"source.ip": ["127.0.0.1", "127.0.0.2"], "source.fqdn": ["example.com",
"example.org"]}
-> {"source.ip": "127.0.0.1", "source.fqdn": "example.com"},
{"source.ip": "127.0.0.1", "source.fqdn": "example.org"}, {"source.ip":
"127.0.0.2", "source.fqdn": "example.com"}, {"source.ip": "127.0.0.2",
"source.fqdn": "example.org"}
### What will change?
We'll change the behaviour of the current IntelMQ internal parsing
process, i. e. you'll be able to add multiple IP addresses to on field,
which will be handled as multiple events, but merged into one event.
This will allow us to combine i. e. a domain with multiple IP addresses
to one event.
### Advantages
Supporting multiple values allows us to add multiple IP addresses to one
event. As opposed to using multiple events with nearly similar data, the
multiple-value approach reduces data duplication and has less overhead,
while on the other hand the complexity increases.
If multiple events would be used instead, related events would need to
be linked together by other means (see section Alternative below).
### Disadvantages (breaking behaviour)
The complexity in IntelMQ and all linked components increases without
doubt. All components dealing with the IntelMQ-data need to be adapted
to deal with multiple values. This includes all bots, but IntelMQ
administrators need to adapt their configurations (e.g. filters, etc.)
as well.
Without the explosion-variant, all connected databases need to be
adapted (e.g. PostgreSQL, SQLite, Elastic, MongoDB etc.) additionally
and all software which is processing data from IntelMQ need to be
adapted. PostgreSQL support arrays for columns, but the scheme
conversion can be complex and resource-hungry.
IntelMQ followed the KISS ("keep it simple, stupid")[4] principle from
its beginning. It is disputable if multiple values breaks with this
principle.
[4]: https://en.wikipedia.org/wiki/KISS_principle
## Alternatives
An alternative to using multiple values per field is to set unique
identifiers (e.g. UUID) per event and let events with the same origin
have the same "parent" identifier. This way, related events can be
linked and compatibility is easier. Relating the events to each other
requires extra steps although, but keeps the KISS principle. This
approach will be described in IEP04.
To solve the use-case of multiple classifications per event, the primary
and most important classification can be used instead of multiple ones.
A possible solution for the classification use-case above would be to
some sort of tagging - in short "tags". I. e.
{
"source.ip": ["192.0.43.8"],
"source.asn": [16876, 40528],
"tags": ["ddos-amplifier", "info-disclosure", "mirai-botnet"]
}
## Other IoC processing formats
For reference, we describe the formats of other IoC-processing systems
similar to IntelMQ. Both formats, IDEA and n6 do support multiple values
in different kinds. If you know of other similar formats supporting
multiple values, please speak up!
### "IDEA"
The IDEA-format, used by CESNET-developed Warden, supports multiple
values for some fields. But the data format structure differs clearly
from IntelMQ's, as you can see in the example below. The classification
is defined per address and network ranges are possible as addresses,
what is not supported in IntelMQ.
IDEA was designed from scratch to overcome disadvantages of Warden's
previous data format.
Example:
"Source": [
{
"Type": ["Phishing"],
"IP4": ["192.168.0.2-192.168.0.5", "192.168.0.10/25"],
"IP6": ["2001:0db8:0000:0000:0000:ff00:0042::/112"],
"Hostname": ["example.com"],
"URL": ["http://example.com/cgi-bin/killemall"],
"Proto": ["tcp", "http"],
"AttachHand": ["att1"],
"Netname": ["ripe:IANA-CBLK-RESERVED1"]
}
],
"Target": [
{
"Type": ["Backscatter",
"OriginSpam"],
"Email": ["innocent(a)example.com"],
"Spoofed": true
},
{
"IP4": ["10.2.2.0/24"],
"Anonymised": true
}
]
Upstream documentation:
https://idea.cesnet.cz/en/indexhttps://warden.cesnet.cz/en/index
### n6
In the n6 format, the addr field is a list of arrays with `ip`, `asn`,
`cc` and `dir` fields. `addr` is similar to IntelMQ's `source`
namespace, but the size of `addr` is much lower and the "direction" of
the address is given by a field inside the addr item.
Example:
[{"ipv6": "abcd::1", "cc": "PL", "asn": 12345, "dir": "dst"}]
Upstream documentation:
https://n6sdk.readthedocs.io/en/latest/tutorial.html#field-class-addressfie…https://n6sdk.readthedocs.io/en/latest/tutorial.html#field-class-extendedad…
--
// Sebastian Waldbauer <waldbauer(a)cert.at> - T: +43 1 5056416 7202
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg
[View Less]
Hi everyone,
I wanted to send back a better write-up of my stance on IEP03 ("multiple values" in the IntelMQ internal data format).
Alas, I was quite busy and I am sprinting to push out some code for our deadline at work.
However, let me summarise it:
I think the IEP03 is very well written, thank you a lot for this! Thinking this through was important and I think Sebastian Waldbauer did a great job.
Reading it, I realised that my initial proposal of having multiple values is really breaking …
[View More]the KISS principle of IntelMQ in a bad way. Worse than I had thought. So, I am thinking of retracting the proposal.
However, .... https://github.com/certtools/ieps/tree/main/003#alternatives has a good core in it.
If we have multiple values, instead of doing the n x m complexity explosion, we link different events (JSON rows) together via UUIDs this gives us what we need:
* UUIDs help with deduplication! That's important when linking IntelMQ instances!
* lower complexity / keep the KISS principle
* consumers can ignore the UUID-linking if it's not relevant for them (f.ex enrichment processes/bots)
* we can still represent linked events.
I would like to add one little but important thing for the UUID linking idea: add a "link-type".
Examples for link-types:
* parent-child event
* grouping types (all of these events belong to the same report)
etc.
With this triplet information , we are close to RDF (left-side, type, right-side) and thus we can (future-proof) represent any type of relation.
A list of valid types needs to be documented in the IDF format page of course.
So, I think with that, we can go ahead.
Thanks,
a.
PS: and sorry that my feedback came a bit late, as said - code sprints.
[View Less]
Hi Drupad,
Am Donnerstag 15 April 2021 17:56:18 schrieb Soni, Drupad:
> Also I want your help in setting up misp output feed as below.
your image showed that you want all events to go into MISP as well
using
https://github.com/certtools/intelmq/blob/develop/intelmq/bots/outputs/misp…
> Feed is working fine adding feed in misp doesn't show any feeds there.
> I am not sure what is the gap here.
Me neither, my experience with MISP is limited, there are many functions
and ways to …
[View More]manually use MISP. When following the documentation, I could make
the api work, but I've not tested the feed. One possibility you have is to
ask the MISP people about how to further analyse the situation (Please give
them all the details.)
Best Regards,
Bernhard
--
www.intevation.de/~bernhard +49 541 33 508 3-3
Intevation GmbH, Osnabrück, DE; Amtsgericht Osnabrück, HRB 18998
Geschäftsführer Frank Koormann, Bernhard Reiter, Dr. Jan-Oliver Wagner
[View Less]
Hi,
Something fails in my unit tests for my output bot: I have a test following this outline:
def test_ok_events(self):
for event in [ firstevent, secondevent ]:
self.input_message = event
self.run_bot(parameters={ 'logging_level': 'DEBUG', ... some other bot specific parameters here ... },
iterations=1, allowed_error_count=0, allowed_warning_count=0)
Above firstevent and secondevent are JSON structures that conform to the Event class.
…
[View More]The first event is handled ok, but the second seems to get handled but in the end
the run_bot method in /opt/dev_intelmq/intelmq/lib/test.py claims things are not ok.
Next, I raise allowed_warning_count to 4 (other bot parameters remain untouched) and the same thing
happens although the bot progresses a bit further down the run_bot method and then prints the
following traceback message in my IDE's console:
---
Failure
Traceback (most recent call last):
File "/opt/dev_intelmq/intelmq/tests/bots/outputs/mybot/test_output.py", line ABC, in test_ok_events
}, allowed_warning_count=4)
File "/opt/dev_intelmq/intelmq/lib/test.py, line 356, in run_bot
''.format(fields['message']))
AssertionError: False is not true : Logline "/opt/dev_intelmq/intelmq/lib/test.py:232: ResourceWarning: unclosed <ssl.SSLSocket ... >" does not end with .? or !.
---
I guess the unclosed socket is due to my bot using Sessions from the requests module to send data
to a back end service and this is not taken down between the two invocations of the run_bot method.
Any ideas how I should modify the test to take this session down before the second event is tested?
Or, maybe better still, how should I modify my bot to close this session down cleanly at exit?
By overriding a suitable stop or shutdown method and closing the session inside this overriding method?
Thanks again, Mika
[View Less]
Dear community,
Thank you for your input and interest on the proposed changes for
IntelMQ's internal data format IEP03 and IEP04. To get the discussion
forward and hopefully to get some conclusions, we propose to do a small
*hackathon*. We think that this would be a great opportunity as well to
bring together the IntelMQ community.
If you are interested, please add your availability here until Tuesday
20th April:
https://www.termino.gv.at/meet/en/p/219061bf603a2dbb91efe8af61ee4718-64494…
[View More]The poll offers several 2h-slots in the week 19th-23rd April, we plan to
select a date on Wednesday 21th April and announce it here, together
with a location.
kind regards
Sebastian
--
// Sebastian Wagner <wagner(a)cert.at> - T: +43 1 5056416 7201
// CERT Austria - https://www.cert.at/
// Eine Initiative der nic.at GmbH - https://www.nic.at/
// Firmenbuchnummer 172568b, LG Salzburg
[View Less]