[Intelmq-dev] Discussion on intelmq output / transformation architecture

Tue Apr 19 23:48:33 CEST 2016

Hi folks,

an interesting discussion started on https://github.com/certtools/intelmq/issues/487

Basically, pedro would like to send data to ArcSite in CEF format.

I would like to propose that we enhance our architecture to include "transformer bots".
Apart from the funny name which reminds me of http://transformers.hasbro.com/en-us/bots, 
this has a nice symmetry to the collector-> parser chain we have on the input side.

So what are we talking about?

Input side:
```

   Collector ---> Parser ---> .... pipeline

```

Currently on the output side it's like this:

```

  ... pipeline -> expert1 -> expert2 -> ... -> output bot 1

```

But this basically leaves the work of transcoding/transforming the DHO format to the output bots.
So far, this was enough.

In the general case, it would be great to transform the DHO to some format. Let's assume CEF or IODEF.

A generic output bot could then take that data and send it over whatever mechanism it knows to a destination.

Example:

```

  ... pipeline -> expert1 -> transform 2 IODEF -> mail bot

or:

  ... pipeline -> expert1 -> transform 2 IODEF -> mail bot
                                              \
                                               \-> sftp bot

```

The question in https://github.com/certtools/intelmq/issues/487 arose on where to store the output data.
I'd suggest to create a new "raw_output"  field in the DHO for this.

A transformer bot shall write its format to the raw_output field (in base64) and an output bot shall take that data, decode the base64 andd send it via its own mechanism.

What do you think?
Would this work? Do you see any serious problem with this approach?

Best,
a.