Archive
Highlighted

Event breaking fails randomly

Explorer

I have an issue with event line breaking in an access log I hope someone can guide me on.
We have an access log where every line is an event. The issue: randomly events are broken mid line.

Looking at the source file on the app server, event breaking is always correct.
When using “Show source“ in Splunk GUI, it indicates wrong event breaking.

Some more details on our config :
• We use an index cluster (4 nodes) with auto load balance
• We use “useAck”

Discoveries summarized :
• Line breaking are always correct when looking on the source file on the application servers.
• Using “show source“ in splunk always indicates the wrong event breaking when the error occurs.
• When the issue occur the event breaking in splunk occur with a random amount of bytes into the line.
• There are 4 application servers generating the access logs running Splunk forwarder - the errors occurs on all of the servers, equally distributed.
• The access log file on the servers are rotated midnight every night
• The occurrence rate of the issue correlate with increasing volume (the issue is more often seen during high-volume periods than low-volume periods).
• The issue does not correlate with the time the forwarder switch indexer (as part of the autoLB functionality). Tested this based on a hypothesis that the wrong event breaking occur when the forwarder switch to start sending to the next indexer.
• The ratio error vs. totalcount varies between 0.05-0.12 precent, and correlate with increased amount of events

One event sample with wrong event break:
[09/Jun/2017:10:15:36 +0200] clientip=... user=- verb=GET uri=/……&WIDTH=256&HEIGHT=256 useragent="Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, li
ke Gecko) Chrome/58.0.3029.110 Safari/537.36 OPR/..." origin=- referer=http://www.***/ respcode=200 resptime=5 xfwdfor="..., ..." guid=b3812af9-6715-a492-0afb-73ddbe57428d

I have added the following props.conf (on the indexers) for the sourcetype :

[appname-access-log]
ANNOTATEPUNCT = True
AUTO
KVJSON = true
BREAK
ONLYBEFORE =
BREAK
ONLYBEFOREDATE = true
CHARSET = UTF-8
DATETIMECONFIG = /etc/datetime.xml
HEADER
MODE =
LEARNSOURCETYPE = true
LINE
BREAKERLOOKBEHIND = 100
LOOKUP-dropdowns = dropdownsLookup host OUTPUT unix
category unixgroup
MATCH
LIMIT = 100000
MAXDAYSAGO = 2000
MAXDAYSHENCE = 2
MAXDIFFSECSAGO = 3600
MAX
DIFFSECSHENCE = 604800
MAXEVENTS = 256
MAX
TIMESTAMPLOOKAHEAD = 128
MUST
BREAKAFTER =
MUST
NOTBREAKAFTER =
MUSTNOTBREAKBEFORE =
SEGMENTATION = indexing
SEGMENTATION-all = full
SEGMENTATION-inner = inner
SEGMENTATION-outer = outer
SEGMENTATION-raw = none
SEGMENTATION-standard = standard
SHOULD
LINEMERGE = true
TIMEPREFIX = [
TRANSFORMS =
TRUNCATE = 10000
detect
trailing_nulls = false
maxDist = 100
priority =
sourcetype =

Highlighted

Re: Event breaking fails randomly

Builder

Just have this as your sourcetype and test the line breaking once. Remove any other line breaker related configuration apart from the below and try.

[appname-access-log]
TIME_PREFIX = \[
MAX_TIMESTAMP_LOOKAHEAD = 30
TIME_FORMAT = %d/%b/%Y:%T %z
LINE_BREAKER = ([\n\r]+)(?=\[\d+\/\w+\/\d+\:\d+)
SHOULD_LINEMERGE = False
0 Karma
Highlighted

Re: Event breaking fails randomly

Explorer

Thanks alot!
I've tested what you suggested, but it did not work.
Working on the comments given by mmodestino_splunk

0 Karma
Highlighted

Re: Event breaking fails randomly

Influencer

Why are you linemerging? Are these events not all one line? Are you using a custom linebreaker?

Are you using a UF on the app server? Is there a Heavy Forwarder involved in the data path to the indexers?

Keep in mind, examining the source file after the fact is not a true representation of what was going on, as Splunk's tail reader monitors in close to real time. The general behavior I have found is that there was a break in the file write so Splunk thinks the line is done or has been closed.

I have this situation at one of my clients and in another answers post, and am trying to get a sure fire way to identify root cause. https://answers.splunk.com/answers/545031/sending-logs-over-scp-to-heavy-forwarder-why-does-1.html#c...

Are you using forceTimeBasedAutoLB?
http://docs.splunk.com/Documentation/Splunk/6.6.1/Admin/Outputsconf

forceTimebasedAutoLB = [true|false]
* Forces existing streams to switch to newly elected indexer every
  AutoLB cycle.
* On universal forwarders, use the EVENT_BREAKER_ENABLE and
  EVENT_BREAKER settings in props.conf rather than forceTimebasedAutoLB
  for improved load balancing, line breaking, and distribution of events.
* Defaults to false.

Also do you see either of these symptoms on the forwarder(s) involved?

1) If there are far too many files getting dropped at the same time. There will be a log message "File descriptor cache is full (100), trimming".
2) There are TcpOutputProc WARN message regarding read timeout.

If none of the above apply feel free to prepare to open a case an gather debug from TailReader/WatchedFile and TailingProcessor and open a case.

http://docs.splunk.com/Documentation/Splunk/6.6.1/Troubleshooting/Enabledebuglogging

Use the cli commands so that you dont need to restart splunk.

0 Karma
Highlighted

Re: Event breaking fails randomly

Explorer

Thank you mmodestino_splunk

On the app servers there is UF and there is no HF between the UF and the INXers.
1) I have tested the following on the forwarder
* outputs.conf
- forceTimebasedAutoLB = false
* props.conf
- EVENTBREAKERENABLE = true

The issue remained the same

2) I have not found any errors of type File descriptor cache is full (100) trimming

3) I have not found any warnings of type TcpOutputProc

I will turn on debugging and open a case

0 Karma
Highlighted

Re: Event breaking fails randomly

Path Finder

Were you able to get this issue resolved? I'm seeing this issue on a log that is generated on the search head, no forwarder involved. The lines themselves are thousands of characters long so I think I have the same issue as you.

0 Karma
Speak Up for Splunk Careers!

We want to better understand the impact Splunk experience and expertise has has on individuals' careers, and help highlight the growing demand for Splunk skills.