Hi folks,
an interesting discussion started on https://github.com/certtools/intelmq/issues/487
Basically, pedro would like to send data to ArcSite in CEF format.
I would like to propose that we enhance our architecture to include "transformer bots". Apart from the funny name which reminds me of http://transformers.hasbro.com/en-us/bots, this has a nice symmetry to the collector-> parser chain we have on the input side.
So what are we talking about?
Input side: ```
Collector ---> Parser ---> .... pipeline
```
Currently on the output side it's like this:
```
... pipeline -> expert1 -> expert2 -> ... -> output bot 1
```
But this basically leaves the work of transcoding/transforming the DHO format to the output bots. So far, this was enough.
In the general case, it would be great to transform the DHO to some format. Let's assume CEF or IODEF.
A generic output bot could then take that data and send it over whatever mechanism it knows to a destination.
Example:
```
... pipeline -> expert1 -> transform 2 IODEF -> mail bot
or:
... pipeline -> expert1 -> transform 2 IODEF -> mail bot \ -> sftp bot
```
The question in https://github.com/certtools/intelmq/issues/487 arose on where to store the output data. I'd suggest to create a new "raw_output" field in the DHO for this.
A transformer bot shall write its format to the raw_output field (in base64) and an output bot shall take that data, decode the base64 andd send it via its own mechanism.
What do you think? Would this work? Do you see any serious problem with this approach?
Best, a.
On 19.04.2016 23:48, L. Aaron Kaplan wrote:
I would like to propose that we enhance our architecture to include "transformer bots".
I'd call them "output transformations".
What do you think? Would this work? Do you see any serious problem with this approach?
This approach is good, but I see one point that should be taken into account:
The parser bot usually creates multiple events from one input event. (e.g. the collector retrieves a larger csv file in a single event, the parser creates one event per line of the csv file).
Out the output side we *can* have a similar process, just in reverse: Multiple events can end up in one email that is sent to e.g. ISPs.
Thus: on the output side there is not just the question of the transformation to cybox/csv/xml/xarf/..., but also the question of aggregation: Which set of events should be grouped together?
Yes, there will be cases where a simple event by event translation is useful, but my gut-feeling is that this is the exception.
I don't have a full-blown proposal ready in my mind, so this just food for though.
otmar
Am Mittwoch, 20. April 2016 09:41:06 schrieb Otmar Lendl:
Thus: on the output side there is not just the question of the transformation to cybox/csv/xml/xarf/..., but also the question of aggregation: Which set of events should be grouped together?
I am also seeing this challenge.
Another problem with the approach is: Some formats specify details of the transport level. For example xarf needs a special mail structure, e.g. some headers set and mime parts depending on the existance of some fields.
So it may not be possible to fully separate the "contents" part from the "transport" part.
One way to solve the design decision is to try to actually make parts of this work.
Another idea is to just have all mapping functions in one python module, which then could be imported by parsers, outputs and other bots.
Best Regards, Bernhard