Getting Data In
Highlighted

Splunk Universal Forwarder missing events

Path Finder

Hi all,

Have you ever seen a UF missing events? I’ve observed some of our UF’s missing ~8 seconds of events and then picking up halfway through the event they reach. The gaps are creating some muddy data and it doesn’t seem to be limited to one server, I’ve got a list of 100 or so across all of our environments and corresponding Splunk clusters.

Here's a 3 line example of what Splunk is seeing in the source(/app/search/show_source?blah). I've been able to manually confirm that there is a gap and plenty of logs between.

2017-12-03 22:25:37 GET /Something/Something/1 from=2017-12-02&to=2017-12-04 80 - 0.0.0.0 HTTP/1.1 - - Some.url.was.here.com.au 200 0 0 00000 000 00 - HasedKeyWasHere ServiceName -
0.0.0.0 HTTP/1.1 - - ome.url.was.here.com.au 200 0 0 000 000 0 - HasedKeyWasHere ServiceName -
202017-12-03 22:25:45 GET /Something/Something/1 from=2017-12-02&to=2017-12-04 80 - 0.0.0.0 HTTP/1.1 - - Some.url.was.here.com.au 200 0 0 00000 000 00 - HasedKeyWasHere ServiceName -

I've tried this with and without line breaking logic to see if it would make any difference in the props.conf with no success. Which is not entirely surprising in hindsight.

It should be worth mentioning that these are all IIS logs being forwarded to a 6 peer node cluster with no heavy forwarders inbetween.

0 Karma
Highlighted

Re: Splunk Universal Forwarder missing events

SplunkTrust
SplunkTrust

Hi @oscarminassian,

I suspect parsing issue in your case, have you tried to search All Time for those missing event with timestamp in your query? What is sourcetype are you using for IIS logs?

Highlighted

Re: Splunk Universal Forwarder missing events

Path Finder

@harsmarvania57, sure did and no luck! 😞

Yeah, these are all IIS sourcetype. I'm using the following search to separate the bad from the good and getting lots of results.

index=web scstatus!=0
| regex sc
status!= ^\d{3}$
| regex scstatus!= ^\d{4}$
| regex _raw!=^\d{4}-\d{2}-\d{2}
| stats count by sc
status host

Also worth mentioning that we're on 6.6.1

0 Karma
Highlighted

Re: Splunk Universal Forwarder missing events

SplunkTrust
SplunkTrust

Any parsing errors in splunkd.log on Indexers ? And I assumed that you searched for index=web "2017-12-03" for All Time and you didn't get any events which ingested in wrong date, am I right?

0 Karma
Highlighted

Re: Splunk Universal Forwarder missing events

Path Finder

No Parsing errors that I can find. Initially I was unable to find any events that had come in on the wrong date time, but I found some! It was hard to track down and I pretty much came across it by accident.

0 Karma
Highlighted

Re: Splunk Universal Forwarder missing events

SplunkTrust
SplunkTrust

Since the IIS sourcetype has indexed fields, if the incoming data doesn't match the sourcetype the data will fail to parse and will be lost.

I would test using another sourcetype that does not have indexed fields temporarily to see if the issue goes away...although only missing some events is strange, is it possible that the log format is not 100% consistent?

0 Karma
Highlighted

Re: Splunk Universal Forwarder missing events

Path Finder

This was one of my first thoughts, we had a puppet change a few months ago that removed the cookie from the IIS logs. Oh boy, the data didn't like that. It went away after the log file rotated. I've been able to verify that it's not the case and the logging is 100% uniform across our IIS fleet.

0 Karma
Highlighted

Re: Splunk Universal Forwarder missing events

SplunkTrust
SplunkTrust

Thanks, either the splunkd log file of the indexer or the forwarder might drop some hints...

0 Karma
Highlighted

Re: Splunk Universal Forwarder missing events

Path Finder

Thanks for the insights, I did some back searching in our S3 archive. Looks like we've had this issue for a long long time, it's just never been reported.

0 Karma
Highlighted

Re: Splunk Universal Forwarder missing events

Esteemed Legend

Are you sure that the events are missing? What I have seen happen many times is that the events are there, just split in the wrong place (mid-event) such that only 1 half of the event meets the TIME_PREFIX and TIME_FORMAT settings so the other half gets a different timestamp and is no longer right next to his halfsie so it looks missing. The problem is usually buffering or chunking in the process that is writing the logfile and the only 2 solutions are to index the file after it rotates (after the writer is done writing to it) or to extend the amount of time that Splunk will wait for a write session to pause before assuming it is done by increasing the TIME_BEFORE_CLOSE setting in inputs.conf:
https://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf

time_before_close = <integer>
* Modification time delta required before the file monitor can close a file on
  EOF.
* Tells the system not to close files that have been updated in past <integer>
  seconds.
* Defaults to 3.
0 Karma