Getting Data In
Highlighted

Why is Splunk combining lines in a small percentage of random events? JSON logs in particular.

Builder

I populate a log file that has one JSON event per line. Each event is about 1,500 bytes. The majority of the events are parsed correctly into JSON in the Splunk gui.

However, a small subset of the events are not being parsed into the red JSON format in the gui. THEY DO have their fields parsed out correctly and it DOES realize that there are over 150 lines for that entire log file source in question, but sticks them in as one huge raw event, usually 130 lines and up. (However, some of the successfully parsed JSON events came from files over 1,000 lines long).

I'm checking my logging script but wanted to make sure I wasn't missing any obvious settings or hitting some kind of threshold somewhere within Splunk. I would of assumed Splunk would of seen 1 giant line for it to parse incorrectly, but this is not the case as it sees all of them just fine. In fact, searching for a new line "backslash n" is how I find these un-json'ed events today.

Any thoughts on what I can check?

Sourcetype definition looks like this:

NO_BINARY_CHECK = 1
pulldown_type = true
SHOULD_LINEMERGE = false
TRUNCATE = 200000
TIME_PREFIX = \"timestamp\"
TZ = UTC

Another revelation: Splunk sees the new lines, but thinks the whole thing is one big line anyways and hits the 200,000 byte limit from above. What the heck? liencount is always > 1, and the event itself says "show all 150" line right in the event in the gui.

Revelation 2: Tried a working and non-working source file in another splunk instance. Both work and were parsed out in JSON just fine in my local, personal instance of splunk. Have no idea what's going on here...

0 Karma
Highlighted

Re: Why is Splunk combining lines in a small percentage of random events? JSON logs in particular.

Legend

You need to specify KV_MODE=json in your props.conf. See here for details

http://docs.splunk.com/Documentation/Splunk/6.2.1/admin/Propsconf

Highlighted

Re: Why is Splunk combining lines in a small percentage of random events? JSON logs in particular.

Builder

Is there a specific reason for this? None of our other json sourcetypes have needed this so far.

0 Karma
Highlighted

Re: Why is Splunk combining lines in a small percentage of random events? JSON logs in particular.

Builder

We tried this but sadly it had no effect.

0 Karma
Highlighted

Re: Why is Splunk combining lines in a small percentage of random events? JSON logs in particular.

SplunkTrust
SplunkTrust

Try adding Line breaking properties such as "BREAKONLYBEFORE" to your props. Can you provide some sample logs?

0 Karma
Highlighted

Re: Why is Splunk combining lines in a small percentage of random events? JSON logs in particular.

Builder

Found a fix. To summarize, Splunk was combining random events into one big long line/event. Most of the events from different source logs in the same sourcetype were fine. We couldn't figure out why Splunk messed up only a few of them.

The fix was to add a manual LINE_BREAKER to our props.conf file for each sourcetype with the issue. We used this one specifically:

LINE_BREAKER = ([\r\n|\n]+)

It breaks the events on rn or n. Still not sure why some events worked by default and some didn't.

Remember that debug/refresh does not implement index-time modifiers like LINE_BREAKER. You'll need to restart splunkd to get it into effect.

View solution in original post

0 Karma