Getting Data In

How does LINE_BREAKER_LOOKBEHIND in props.conf work?

ankithreddy777
Contributor

May I know how exactly LINE_BREAKER_LOOKBEHIND works? I am little bit confused by the explanation given in Splunk documentation. Any example would be great.

Masa
Splunk Employee
Splunk Employee

In general, no need to consider this attribute.

I believe this is how LINE_BREAKER_LOOKBEHIND is used.

Data is coming as stream and splunk allocate memory (chunks of data) for the stream data.

Assuming I know end of event starts with time stamp 2017-01-03 12:00:00,

LINE_BREAKER = ([\n\r]+)\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}

If chunk 1 contains new line character and partial timestamp, while chunk 2 contains timestamp , Splunk needs both chunk1 and chunk2 to check and match the LINE_BREAKER regex. LINE_BREAKER_LOOKBEHIND will keep the last part of 1st Chunk when checking the chunk 2. Default is 100 characters.

yoho
Contributor

I was wondering the same and came up more or less with the same conclusion as Masa BUT I have more questions. Let's take a concrete example:

chunk1 : [data1]\n 2017-01-03 12:00:00 [data2]\n 2017-01-03 12:0
chunk2 : 0:01 [data3]\n ...

So when splunk processes chunk1, the regex will match only once: after [data1]. So data1 will be correctly identified as being part of an event (but not yet data2 and the preceding timestamp).

When processing chunk2, splunk will prefix the chunk with LOOK_BEHIND bytes of chunk1. The regex will then match if LOOK_BEHIND is greater or equal than length(data2) + 2*length(timestamp).

However, I find it all quite clumsy. First, splunk should know that ALL the leftover of chunk1 should be processed (why limit it to LOOK_BEHIND ?). Then, the default of 100 bytes appear quite small: it will only work for events whose total length + timestamp length is below 100 bytes.

Am I missing something ?

0 Karma

hrawat_splunk
Splunk Employee
Splunk Employee

 

LINE_BREAKER_LOOKBEHIND = <integer>
* The number of bytes before the end of the raw data chunk
  to which Splunk software should apply the 'LINE_BREAKER' regex.
* When there is leftover data from a previous raw chunk,
  LINE_BREAKER_LOOKBEHIND indicates the number of bytes before the end of
  the raw chunk (with the next chunk concatenated) where Splunk software
  applies the LINE_BREAKER regex.

First of all above config kicks -in only  if you have 'LINE_BREAKER' regex set.

Assuming you have 'LINE_BREAKER' regex '\n'.

First pass, chunk1 will be processed and since there is no previous leftover chunk,

chunk1 : [data1]\n 2017-01-03 12:00:00 [data2]\n 2017-01-03 12:0

Will result in creating two events data1 and data2. Rest is leftover for chunk2.

Second pass, chunk2 will be processed, since we have a leftover, LINE_BREAKER_LOOKBEHIND will be applied only if leftover size > LINE_BREAKER_LOOKBEHIND.

chunk2 : 0:01 [data3]\n ...

In this example LINE_BREAKER_LOOKBEHIND was not applicable as leftover bytes < LINE_BREAKER_LOOKBEHIND(default 100).

In case, if there  is a scenario where it's applicable,  all splunk is doing is to exclude  first LINE_BREAKER_LOOKBEHIND bytes from regex  of new string ( leftover + chunk2).

Why to apply regex on entire leftover part when we already know there is no regex match( during first pass).


0 Karma

yoho
Contributor

In addition, I'm wondering how this all works with an indexer cluster where chunks are spreaded over multiple indexers (if you have UFs connecting to indexers in a round robin).

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...