Getting Data In

Splunk not breaking events on line break properly

jcfergus
Engager

Ok, I'm at my wits' end here. I have an application log which produces events of the format:

DEBUG | 2012-02-16 11:01:30,683 [http-10.0.0.1-8443-Processor6] SystemFile  - field1=value1 timestamp=2012-02-16 11:01:30.679 CST   field2=value2   field3=value3   field4=value4   field5= field6=value6   field7=A field value with spaces in it  field8=
DEBUG | 2012-02-16 11:01:32,457 [http-10.0.0.1-8443-Processor10] SystemFile  - field1=value1    timestamp=2012-02-16 11:01:32,450 CST   field2=value2   field3= field4=value4   field5= field6=value6   field7=Another field with spaces in it  field8=value8

Basically tab-delimited name/value pairs, with nice neat newlines at the end of the lines (I've verified the line breaks and tabs in a hex editor, and all events are being written via the same log4j config). I -thought- I had it all being parsed just fine, but it appears that the index-time parsing is not always splitting the events on newlines, and I'll end up with two (or three, or four, or five) log lines in one event. They have different timestamps, so it's not that it's rolling them up into one (the above two events are a sanitzed example of two that got rolled together). I would suspect it's that the first one ends with an equals sign (no value), but there are plenty of events in the same log that look identical that get split properly. I'm stumped.

My props.conf for the log source looks like:

[MySourceType]
LINE_BREAKER = ([\r\n]+)
REPORT-tab-kv-manual = tab-kv-manual
KV_MODE = NONE
TIME_PREFIX = DEBUG
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
MAX_TIMESTAMP_LOOKAHEAD = 30

And my transforms.conf looks like:

[tab-kv-manual]
REGEX = (\t|- )([^=]+)=([^\t\n]*)
FORMAT = $2::$3
REPEAT_MATCH = true

Any suggestions?

0 Karma

thisissplunk
Builder

Did you ever figure this out? Having the same issue. Testing the explicit line breaker currently.

0 Karma

somesoni2
Revered Legend

What is your data format? Also, include "SHOULD_LINEMERGE=false" in props.conf along with LINE_BREAKER.

kristian_kolb
Ultra Champion

I've been there as well, and while it looks like your LINE_BREAKER regex is correct, I think I remember that being a bit more explicit solved the issue:

LINE_BREAKER = ([\r\n]+)[A-Z]+\s+\|\s+\d+

Also, your TIME_PREFIX is just wrong, it should be:

TIME_PREFIX = ^[A-Z]+\s+\|\s+

Hope this helps,

Kristian

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

  🚀 Your data just got a serious AI upgrade — are you ready? Say hello to the Agentic Era with the ...

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...