Getting Data In

Splunk not breaking events on line break properly

jcfergus
Engager

Ok, I'm at my wits' end here. I have an application log which produces events of the format:

DEBUG | 2012-02-16 11:01:30,683 [http-10.0.0.1-8443-Processor6] SystemFile  - field1=value1 timestamp=2012-02-16 11:01:30.679 CST   field2=value2   field3=value3   field4=value4   field5= field6=value6   field7=A field value with spaces in it  field8=
DEBUG | 2012-02-16 11:01:32,457 [http-10.0.0.1-8443-Processor10] SystemFile  - field1=value1    timestamp=2012-02-16 11:01:32,450 CST   field2=value2   field3= field4=value4   field5= field6=value6   field7=Another field with spaces in it  field8=value8

Basically tab-delimited name/value pairs, with nice neat newlines at the end of the lines (I've verified the line breaks and tabs in a hex editor, and all events are being written via the same log4j config). I -thought- I had it all being parsed just fine, but it appears that the index-time parsing is not always splitting the events on newlines, and I'll end up with two (or three, or four, or five) log lines in one event. They have different timestamps, so it's not that it's rolling them up into one (the above two events are a sanitzed example of two that got rolled together). I would suspect it's that the first one ends with an equals sign (no value), but there are plenty of events in the same log that look identical that get split properly. I'm stumped.

My props.conf for the log source looks like:

[MySourceType]
LINE_BREAKER = ([\r\n]+)
REPORT-tab-kv-manual = tab-kv-manual
KV_MODE = NONE
TIME_PREFIX = DEBUG
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
MAX_TIMESTAMP_LOOKAHEAD = 30

And my transforms.conf looks like:

[tab-kv-manual]
REGEX = (\t|- )([^=]+)=([^\t\n]*)
FORMAT = $2::$3
REPEAT_MATCH = true

Any suggestions?

0 Karma

thisissplunk
Builder

Did you ever figure this out? Having the same issue. Testing the explicit line breaker currently.

0 Karma

somesoni2
Revered Legend

What is your data format? Also, include "SHOULD_LINEMERGE=false" in props.conf along with LINE_BREAKER.

kristian_kolb
Ultra Champion

I've been there as well, and while it looks like your LINE_BREAKER regex is correct, I think I remember that being a bit more explicit solved the issue:

LINE_BREAKER = ([\r\n]+)[A-Z]+\s+\|\s+\d+

Also, your TIME_PREFIX is just wrong, it should be:

TIME_PREFIX = ^[A-Z]+\s+\|\s+

Hope this helps,

Kristian

Get Updates on the Splunk Community!

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

The University of Nevada, Las Vegas (UNLV) is another premier research institution helping to shape the next ...

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...