I have an issue with event line breaking in an access log I hope someone can guide me on.
We have an access log where every line is an event. The issue: randomly events are broken mid line.
Looking at the source file on the app server, event breaking is always correct.
When using “Show source“ in Splunk GUI, it indicates wrong event breaking.
Some more details on our config :
• We use an index cluster (4 nodes) with auto load balance
• We use “useAck”
Discoveries summarized :
• Line breaking are always correct when looking on the source file on the application servers.
• Using “show source“ in splunk always indicates the wrong event breaking when the error occurs.
• When the issue occur the event breaking in splunk occur with a random amount of bytes into the line.
• There are 4 application servers generating the access logs running Splunk forwarder - the errors occurs on all of the servers, equally distributed.
• The access log file on the servers are rotated midnight every night
• The occurrence rate of the issue correlate with increasing volume (the issue is more often seen during high-volume periods than low-volume periods).
• The issue does not correlate with the time the forwarder switch indexer (as part of the autoLB functionality). Tested this based on a hypothesis that the wrong event breaking occur when the forwarder switch to start sending to the next indexer.
• The ratio error vs. totalcount varies between 0.05-0.12 precent, and correlate with increased amount of events
One event sample with wrong event break:
[09/Jun/2017:10:15:36 +0200] clientip=... user=- verb=GET uri=/……&WIDTH=256&HEIGHT=256 useragent="Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, li
ke Gecko) Chrome/58.0.3029.110 Safari/537.36 OPR/..." origin=- referer=http://www.***/ respcode=200 resptime=5 xfwdfor="..., ..." guid=b3812af9-6715-a492-0afb-73ddbe57428d
I have added the following props.conf (on the indexers) for the sourcetype :
ANNOTATEPUNCT = True
AUTOKVJSON = true
BREAKONLYBEFOREDATE = true
CHARSET = UTF-8
DATETIMECONFIG = /etc/datetime.xml
LEARNSOURCETYPE = true
LINEBREAKERLOOKBEHIND = 100
LOOKUP-dropdowns = dropdownsLookup host OUTPUT unixcategory unixgroup
MATCHLIMIT = 100000
MAXDAYSAGO = 2000
MAXDAYSHENCE = 2
MAXDIFFSECSAGO = 3600
MAXDIFFSECSHENCE = 604800
MAXEVENTS = 256
MAXTIMESTAMPLOOKAHEAD = 128
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULDLINEMERGE = true
TIMEPREFIX = [
TRUNCATE = 10000
detecttrailing_nulls = false
maxDist = 100
Just have this as your sourcetype and test the line breaking once. Remove any other line breaker related configuration apart from the below and try.
[appname-access-log] TIME_PREFIX = \[ MAX_TIMESTAMP_LOOKAHEAD = 30 TIME_FORMAT = %d/%b/%Y:%T %z LINE_BREAKER = ([\n\r]+)(?=\[\d+\/\w+\/\d+\:\d+) SHOULD_LINEMERGE = False
Why are you linemerging? Are these events not all one line? Are you using a custom linebreaker?
Are you using a UF on the app server? Is there a Heavy Forwarder involved in the data path to the indexers?
Keep in mind, examining the source file after the fact is not a true representation of what was going on, as Splunk's tail reader monitors in close to real time. The general behavior I have found is that there was a break in the file write so Splunk thinks the line is done or has been closed.
I have this situation at one of my clients and in another answers post, and am trying to get a sure fire way to identify root cause. https://answers.splunk.com/answers/545031/sending-logs-over-scp-to-heavy-forwarder-why-does-1.html#c...
Are you using forceTimeBasedAutoLB?
forceTimebasedAutoLB = [true|false] * Forces existing streams to switch to newly elected indexer every AutoLB cycle. * On universal forwarders, use the EVENT_BREAKER_ENABLE and EVENT_BREAKER settings in props.conf rather than forceTimebasedAutoLB for improved load balancing, line breaking, and distribution of events. * Defaults to false.
Also do you see either of these symptoms on the forwarder(s) involved?
1) If there are far too many files getting dropped at the same time. There will be a log message "File descriptor cache is full (100), trimming".
2) There are TcpOutputProc WARN message regarding read timeout.
If none of the above apply feel free to prepare to open a case an gather debug from TailReader/WatchedFile and TailingProcessor and open a case.
Use the cli commands so that you dont need to restart splunk.
Thank you mmodestino_splunk
On the app servers there is UF and there is no HF between the UF and the INXers.
1) I have tested the following on the forwarder
- forceTimebasedAutoLB = false
- EVENTBREAKERENABLE = true
The issue remained the same
2) I have not found any errors of type File descriptor cache is full (100) trimming
3) I have not found any warnings of type TcpOutputProc
I will turn on debugging and open a case
Were you able to get this issue resolved? I'm seeing this issue on a log that is generated on the search head, no forwarder involved. The lines themselves are thousands of characters long so I think I have the same issue as you.