Getting Data In

Understanding LINE_BREAKER regexes

stevesq
Explorer

I'm trying to wrap my head around LINE_BREAKER regexes, especially WRT whitespace handling and wildcard matching.

Given a file containing:

y z
xx1
xx2
y z
xx3
xx4
y y
xx5
xx6

And applying either of:

LINE_BREAKER = ([\r\n]+)(?:y\s+z)

LINE_BREAKER = ([\r\n]+)(?:y.*?z)

Splunk will make a new event at "y y", even though I don't want it to. In other words,

I expect:

y z
xx1
xx2

y z
xx3
xx4
y y
xx5
xx6

But splunk actually produces:

y z
xx1
xx2

y z
xx3
xx4

y y
xx5
xx6

Presumably it's matching the "y\s+" / "y.*?" and deciding to break on that line. What am I missing? How can I get it to recognize the "z" in the regex?

woodcock
Esteemed Legend

Did you sent SHOULD_LINEMERGE = false? This should work:

LINE_BREAKER = ([\r\n]+)y\s+z
SHOULD_LINEMERGE = false

hexx
Splunk Employee
Splunk Employee

This appears to be one of those elusive cases where LINE_BREAKER fails where BREAK_ONLY_BEFORE succeeds...

I was able to reproduce the problem you report from your test data with LINE_BREAKER. However. using :

BREAK_ONLY_BEFORE = y\s+z

...I get the expected results. Give BREAK_ONLY_BEFORE a try and let us know if it works. Remember to remove the ([\r\n]+) capture group as BREAK_ONLY_BEFORE doesn't need it.

From props.conf.spec :

BREAK_ONLY_BEFORE = 
* When set, Splunk creates a new event only if it encounters a new line that matches the
regular expression.
* Defaults to empty.

I have opened a bug (SPL-41430) to have our developers take a look at this issue.

UPDATE : As Masa stated, if you are using LINE_BREAKER, you must use SHOULD_LINEMERGE = false. The test file is properly line-broken with the following configuration :


LINE_BREAKER = ([\r\n]+)y\s+z
SHOULD_LINEMERGE = false

Masa
Splunk Employee
Splunk Employee

I would do the same solution as hexx suggested in general.

( I could not add the comment. So, I'm using another answer field.)


Additional Info:

Splunk processes a stream of data as follows;

  1. Break the stream into single line
    LINE_BREAKER will be used here.
    ( At this point, Splunk does not know if event is a single line or not)

  2. Check if need to merge multiple lines as one event
    SHOULD_LINEMERGE, BREAK_ONLY_BEFORE, etc work here
    ( At this point, Splunk recognizes each event as either multi-line or single line)

I think it's possible that the issue was at the line merge time in your case.
Also, the "lookahead (?=)" regex would be more appropriate than "No backreference (?:)" in this case.

So, there is an alternative solution;

LINE_BREAKER = ([\r\n]+)(?=y\s+z)
SHOULD_LINEMERGE = false
LEARN_MODEL = false

I did a quick test with this, and it worked for me.

If this does not work, possibly there is props.conf in learned app generated configuration for this event.
In that case, delete the part in $SPLUNK_HOME/etc/apps/learned/local/props.conf.

Get Updates on the Splunk Community!

Splunk Enterprise Security: Your Command Center for PCI DSS Compliance

Every security professional knows the drill. The PCI DSS audit is approaching, and suddenly everyone's asking ...

Developer Spotlight with Guilhem Marchand

From Splunk Engineer to Founder: The Journey Behind TrackMe    After spending over 12 years working full time ...

Cisco Catalyst Center Meets Splunk ITSI: From 'Payments Are Down' to Root Cause in ...

The Problem: When Networks and Services Don't Talk Payment systems fail at a retail location. Customers are ...