I'm trying to index the output of the Nessus vulnerability scanner in NBE format.
There are two types of events in nessus: results and timestamps. They look like the following:
timestamps||10.10.10.31|host_end|Tue May 22 17:28:33 2012|
results|10.10.10|10.10.10.100|epmap (135/tcp)|10736|Security Note|Synopsis
I'm finding that hosts in the 10.x.x.x network range are having their timestamps extracted improperly. In the first event above, for example, the _time value is 10/31/10 17:28:33 2012.
I can't do field extractions using the "delims" option in transforms.conf since the number of fields is different between timestamp and results events. I've tried the following two extractions, but Splunk is still being confused by some of the hosts in the 10.x.x.x range:
EXTRACT-timestamps = (?i)(?P<action>[^\|]+)\|\|(?P<dest>[^\|]+)\|(?P<result_type>[^\|]+)\| EXTRACT-results = (?i)(?P<action>[^\|]+)\|(?P<network>[^\|]+)\|(?P<dest>[^\|]+)\|(?P<service_name>[^\|]+)\|(?P<nessus_id>[^\|]+)\|(?P<result_type>[^\|]+)\|(?P<synopsis>[^\|]+)
And the searches that include events with the bad timestamps are incredibly slow and I see the following error:
Field extractor name=EXTRACT-timestamps is unusually slow (max single event time=1891ms, probes=9 warning max=1000ms)
So... How do I make Splunk see those timestamps correctly? Why is the EXTRACT-timestamps extraction taking so long?
You need to tell Splunk where in your event it should look for a timestamp, and how it should be parsed. More information on that is available in the docs: http://docs.splunk.com/Documentation/Splunk/latest/Data/Configuretimestamprecognition
As for your extraction regex for the timestamp type events, you probably want to add an initial caret (
^) in the regex so the regex engine doesn't bother searching for matches that start after the beginning of the event.
From what I read in that doc, the TIME_PREFIX is probably the setting I'm looking for. But it says that if the regex doesn't match, the event doesn't get a timestamp.
In my experience with Nessus results, Splunk will usually extract the timestamp from the host_start event and then reuse that timestamp on subsequent "results" events until it sees another event with a timestamp in it.
Does that mean that the "results" events, that don't have a timestamp, won't have an time value if the TIMEPREFIX regex doesn't match?
No, all indexed events carry a _time value. If Splunk doesn't find a timestamp it assigns one based on a number of rules, which are described here: http://docs.splunk.com/Documentation/Splunk/latest/Data/HowSplunkextractstimestamps