Getting Data In

How does LINE_BREAKER_LOOKBEHIND in props.conf work?

ankithreddy777
Contributor

May I know how exactly LINE_BREAKER_LOOKBEHIND works? I am little bit confused by the explanation given in Splunk documentation. Any example would be great.

Masa
Splunk Employee
Splunk Employee

In general, no need to consider this attribute.

I believe this is how LINE_BREAKER_LOOKBEHIND is used.

Data is coming as stream and splunk allocate memory (chunks of data) for the stream data.

Assuming I know end of event starts with time stamp 2017-01-03 12:00:00,

LINE_BREAKER = ([\n\r]+)\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}

If chunk 1 contains new line character and partial timestamp, while chunk 2 contains timestamp , Splunk needs both chunk1 and chunk2 to check and match the LINE_BREAKER regex. LINE_BREAKER_LOOKBEHIND will keep the last part of 1st Chunk when checking the chunk 2. Default is 100 characters.

yoho
Contributor

I was wondering the same and came up more or less with the same conclusion as Masa BUT I have more questions. Let's take a concrete example:

chunk1 : [data1]\n 2017-01-03 12:00:00 [data2]\n 2017-01-03 12:0
chunk2 : 0:01 [data3]\n ...

So when splunk processes chunk1, the regex will match only once: after [data1]. So data1 will be correctly identified as being part of an event (but not yet data2 and the preceding timestamp).

When processing chunk2, splunk will prefix the chunk with LOOK_BEHIND bytes of chunk1. The regex will then match if LOOK_BEHIND is greater or equal than length(data2) + 2*length(timestamp).

However, I find it all quite clumsy. First, splunk should know that ALL the leftover of chunk1 should be processed (why limit it to LOOK_BEHIND ?). Then, the default of 100 bytes appear quite small: it will only work for events whose total length + timestamp length is below 100 bytes.

Am I missing something ?

0 Karma

hrawat_splunk
Splunk Employee
Splunk Employee

 

LINE_BREAKER_LOOKBEHIND = <integer>
* The number of bytes before the end of the raw data chunk
  to which Splunk software should apply the 'LINE_BREAKER' regex.
* When there is leftover data from a previous raw chunk,
  LINE_BREAKER_LOOKBEHIND indicates the number of bytes before the end of
  the raw chunk (with the next chunk concatenated) where Splunk software
  applies the LINE_BREAKER regex.

First of all above config kicks -in only  if you have 'LINE_BREAKER' regex set.

Assuming you have 'LINE_BREAKER' regex '\n'.

First pass, chunk1 will be processed and since there is no previous leftover chunk,

chunk1 : [data1]\n 2017-01-03 12:00:00 [data2]\n 2017-01-03 12:0

Will result in creating two events data1 and data2. Rest is leftover for chunk2.

Second pass, chunk2 will be processed, since we have a leftover, LINE_BREAKER_LOOKBEHIND will be applied only if leftover size > LINE_BREAKER_LOOKBEHIND.

chunk2 : 0:01 [data3]\n ...

In this example LINE_BREAKER_LOOKBEHIND was not applicable as leftover bytes < LINE_BREAKER_LOOKBEHIND(default 100).

In case, if there  is a scenario where it's applicable,  all splunk is doing is to exclude  first LINE_BREAKER_LOOKBEHIND bytes from regex  of new string ( leftover + chunk2).

Why to apply regex on entire leftover part when we already know there is no regex match( during first pass).


0 Karma

yoho
Contributor

In addition, I'm wondering how this all works with an indexer cluster where chunks are spreaded over multiple indexers (if you have UFs connecting to indexers in a round robin).

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.