Dear all,
as a short announcement, we are currently starting to work on parsers for the follwing shadowserver feeds.
Drone [Done] Microsoft Sinkhole Sinkhole HTTP Drone DNS Open Resolvers NTP Monitor [Done] Open Portmapper Open CharGen Open Elasticsearch Open IPMI Open MDNS Open Memcached [Done] Open MongoDB Open MS-SQL Open NetBIOS Open Redis Open SNMP Open SSDP SSL FREAK SSL POODLE [Done]
We expect them to be ready by the end of this week.
BR Dustin
Dear all,
as announced, we restructured the shadowserver parser.
Please have a look at https://github.com/Intevation/intelmq/tree/shadowserver-feeds/intelmq/bots/p...
especially the file config.py.
The file contains a bunch of mappings of the feeds below. We are not sure if the mappings are correct.
Can someone verify this and, if possible, remove the appropriate todos, or correct the mapping?
BR Dustin
Am Dienstag 07 Juni 2016 17:24:04 schrieb Dustin Demuth:
Dear all,
as a short announcement, we are currently starting to work on parsers for the follwing shadowserver feeds.
Drone [Done] Microsoft Sinkhole Sinkhole HTTP Drone DNS Open Resolvers NTP Monitor [Done] Open Portmapper Open CharGen Open Elasticsearch Open IPMI Open MDNS Open Memcached [Done] Open MongoDB Open MS-SQL Open NetBIOS Open Redis Open SNMP Open SSDP SSL FREAK SSL POODLE [Done]
We expect them to be ready by the end of this week.
BR Dustin
Dustin, All,
On 16.06.2016 12:27, Dustin Demuth wrote:
The file contains a bunch of mappings of the feeds below. We are not sure if the mappings are correct.
Can someone verify this and, if possible, remove the appropriate todos, or correct the mapping?
I've been maintaining parsers (in a different context) for Shadowserver feeds for the last 3 years. Based on that experience a few comments:
* Don't assume that the field-names will stay constant. Be prepared to support logic like "use 'ip' or 'srcip' for the IntelMQ 'source.ip'".
For the Drone feed, I e.g. have the following mapping rules in our old system:
# Mapping from local CSV column names to eventDB column names $self->{eventdb_map} = { asn => "reported_asn", ip => "src_ip", hostname => "src_hostname", port => "src_port", cc => "dst_ip", cc_ip => "dst_ip", cc_port => "dst_port", cc_dns => "dst_fqdn", timestamp => "ts", url => "dst_url", geo => "reported_iso2cc", infection => "malware", machine_name => "local_hostname", # older names "Timestamp" => "ts", "Drone" => "src_ip", "ASN" => "reported_asn", "Geo" => "reported_iso2cc", "Hostname" => "src_hostname", "C&C" => "dst_ip", "C&C DNS" => "dst_fqdn", "C&C Port" => "dst_port", "Infection" => "malware", };
* I see you support a fixup-function for each attribute. Yes, this is needed but potentially not good enough. The reason is that you might need to manipulate multiple fields together, e.g. it varies by feed whether C&C URLs are transmitted as full URL or split up in proto/port/hostname/path. If you want to unify these fields, a single function per attribute will not do.
Here is code from one of my parsers (not shadowserver, this is for Virustracker) to demonstrate this point.
if (exists($row->{reported_asn})) { $row->{reported_asn} =~ s/^AS(\d+)\s*.*/$1/; } if (($row->{Type} eq 'HTTP') and $row->{RequestPath} and $row->{dst_fqdn} and $row->{"dst_port"} and ($row->{"dst_port"} =~ /^80|443$/)) { $row->{RequestPath} =~ s,^/?,/,; # make sure request starts with / $row->{"dst_url"} = (($row->{"dst_port"} eq '443') ? 'https://' : 'http://') . ($row->{dst_fqdn} ? $row->{dst_fqdn} : "") . ($row->{RequestPath} ? $row->{RequestPath} : "") ; }
# for udp p2p botnets, the destinatin IP address is encoded in the Domain parameter if (($row->{Type} =~ /P2P/) and $row->{dst_fqdn} and ($row->{"dst_fqdn"} =~ /^([\d.]+):/)) { $row->{"dst_ip"} = $1; delete($row->{"dst_fqdn"}); }
# move IP-addresses from fqdn to ip field if ($row->{dst_fqdn} and ($row->{"dst_fqdn"} =~ /^([\d.]+)$/)) { $row->{"dst_ip"} = $1; delete($row->{"dst_fqdn"}); }
HTH,
otmar
Dear Otmar, All
thank you very much for your detailled feedback!
Am Donnerstag 16 Juni 2016 13:04:00 schrieb Otmar Lendl:
I've been maintaining parsers (in a different context) for Shadowserver feeds for the last 3 years. Based on that experience a few comments:
- Don't assume that the field-names will stay constant. Be prepared to
support logic like "use 'ip' or 'srcip' for the IntelMQ 'source.ip'".
We have already seen this phenomenon, I guess the most recent change was "cc_ip". That's one of the reasons I extracted the mappings from the parser-code
For the Drone feed, I e.g. have the following mapping rules in our old system:
# Mapping from local CSV column names to eventDB column names $self->{eventdb_map} = { asn => "reported_asn", ip => "src_ip", hostname => "src_hostname", port => "src_port", cc => "dst_ip", cc_ip => "dst_ip", cc_port => "dst_port", cc_dns => "dst_fqdn", timestamp => "ts", url => "dst_url", geo => "reported_iso2cc", infection => "malware", machine_name => "local_hostname", # older names "Timestamp" => "ts", "Drone" => "src_ip", "ASN" => "reported_asn", "Geo" => "reported_iso2cc", "Hostname" => "src_hostname", "C&C" => "dst_ip", "C&C DNS" => "dst_fqdn", "C&C Port" => "dst_port", "Infection" => "malware", };
This seems to be the equal to our mapping.
- I see you support a fixup-function for each attribute. Yes, this is
needed but potentially not good enough. The reason is that you might need to manipulate multiple fields together, e.g. it varies by feed whether C&C URLs are transmitted as full URL or split up in proto/port/hostname/path. If you want to unify these fields, a single function per attribute will not do.
Yes, right now only one parameter is evaluated. I am aware that more complex operations might be required in the near future. I've also seen this requirement for the fqdn / url fields.
[see: https://github.com/certtools/intelmq/issues/524#issue-155435422, last point]
I'm not sure if deducting the correct information from the feed will work as expected. With our limited amount of data I could already see, that not in every case all information is available in order calculate the correct value. (protocoll missing, or might https on por 80 be possible). By calculating these values, on could make false assumptions.
Nevertheless, it's seems that this approach works out for you, at least for virustracker. This is great news.
HTH,
Yes, very much!
BR Dustin