Getting Data In
Highlighted

Events Delayed / Not Split Correctly

Communicator

Questions

  1. Can anyone point me to a (really) detailed description of how the Forwarder/Indexer work?
  2. When does Splunk actually split files into events?
  3. Why are my events delayed when using MUST_BREAK_AFTER and time_before_close?
  4. Is it possible to tell Splunk to ignore EOF completely or until the file is not written to for a specified amount of time?

We are using Splunk 6.2 by the way.

Description

We are writing application (.NET Framework) logs to a file on a Windows system. Occasionally some events are split at seemingly random locations. When importing the log file containing a split event on another Splunk instance, the event is not split. Neither the log file size, the location of the events in the log file nor the size of the events are consistent. Further investigation lead me to this question: 1. The problem explained there seems to be similar, tough in our case events are not split periodically and we do not use/set the time_before_close parameter. Basically, due to buffering etc. it is possible that only a part of a event is written to the file before remaining data is added, maybe causing the problem.

So I created a simple test setup to check the behavior of Splunk when writing partial events to a file:

inputs.conf


[monitor://C:\Logs]
disabled = false
host = MyHost
index = test
whitelist = ^..log$
sourcetype = mySourcetype
props.conf*

[mySourcetype]
CHARSET = CP1252
BREAKONLYBEFOREDATE = false
BREAK
ONLYBEFORE = [Start]
MAX
EVENTS = 1000000000
TIMEPREFIX = ([\r\n]|\s)timestamp="
MAX
TIMESTAMP_LOOKAHEAD = 64
Then I created the file with the following content (a partial event):

C:/Logs/MyLog.log


[Start]
Test
while running a real-time search. After a few seconds a new event was reported. So I assume because the file has not been written to for 3 seconds ( time_before_close defaults to 3), Splunk determines this must be a complete event (even tough the documentation for BREAK_ONLY_BEFORE states that

Splunk creates a new event only if it encounters a new line that matches the regular expression 2.
There is no new line matching the regular expression at the end of the file, not even a new line character, only EOF.
So increasing the value of the time_before_close parameter could fix the problem. Except that the last event would be delayed for the specified time, as there won't be a new event after the last one. Therefore we will also need to use MUST_BREAK_AFTER to ensure the last event can be reported immediately:

inputs.conf


[monitor://D:\LogFiles\Test]
disabled = false
host = MyHost
index = test
whitelist = ^..log$
sourcetype = mySourcetype
timebeforeclose = 300
props.conf*

[mySourcetype]
CHARSET = CP1252
BREAKONLYBEFOREDATE = false
BREAK
ONLYBEFORE = [Start]
MUST
BREAKAFTER = [End]
MAX
EVENTS = 1000000000
TIMEPREFIX = ([\r\n]|\s)timestamp="
MAX
TIMESTAMP_LOOKAHEAD = 64

Now events are only reported after 300 seconds. I assumed the Forwarder waits for the specified time before reading new events. So to confirm my assumptions, I checked the FileStatus as described in 3. Strangely enough, immediately after I add complete and partial events to my log file, the file position increases and the percent field goes up to 100. So new content is immediately processed, tough events do not show up in Splunk Web (and are not added to the Event count of the index).

References

Highlighted

Re: Events Delayed / Not Split Correctly

Esteemed Legend

There is an excellent wiki that tells more than most people can grok:
http://wiki.splunk.com/Community:HowIndexingWorks

View solution in original post

Highlighted

Re: Events Delayed / Not Split Correctly

Communicator

Thanks a lot - that's exactly what I was looking for.

0 Karma