Getting Data In

How to handle LINE_BREAKER regex for multiple capture groups? Specifically now that we are getting both ip4 and ip6 logs?

briancronrath
Contributor

In the past we had an easy LINE_BREAKER regex that broke on newlines where an ip4 was present ([\r\n]+)\d+.\d+.\d+.\d+

Now we have some logs with ip6 in addition to ip4 being logged, so I was hoping I can just do this via piping it out to alternate capture groups depending on which ip it matches:

([\r\n]+)(\d+.\d+.\d+.\d+|(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]).){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]).){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])))

Is there something present where splunk only expects one capture group to be here for the LINE_BREAKER regex? I'm wondering how we can handle linebreakers now that we have 2 different style of IP that can come in.

0 Karma
1 Solution

gjanders
SplunkTrust
SplunkTrust

Reading the LINE_BREAKER documentation I'm wondering if it's something to do with the parentheses around the regex match after the ([\r\n]+)

As per the props.conf documentation it says:

Example 1:  LINE_BREAKER = end(\n)begin|end2(\n)begin2|begin3

  * A line ending with 'end' followed a line beginning with 'begin' would
    match the first branch, and the first capturing group would have a match
    according to rule 1.  That particular newline would become a break
    between lines.

So I'm assuming you probably don't want to have the various (), also you could probably simplify it to match part of the IP address, unless you often have lines that look similar, normally I would match the first few parts of the IP address or similar...

Example:

([\r\n]+)\d+\.\d+\.\d+\.\d+|([\r\n]+)[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}

Perhaps?
If this works I'll convert it to an answer...

View solution in original post

gjanders
SplunkTrust
SplunkTrust

Reading the LINE_BREAKER documentation I'm wondering if it's something to do with the parentheses around the regex match after the ([\r\n]+)

As per the props.conf documentation it says:

Example 1:  LINE_BREAKER = end(\n)begin|end2(\n)begin2|begin3

  * A line ending with 'end' followed a line beginning with 'begin' would
    match the first branch, and the first capturing group would have a match
    according to rule 1.  That particular newline would become a break
    between lines.

So I'm assuming you probably don't want to have the various (), also you could probably simplify it to match part of the IP address, unless you often have lines that look similar, normally I would match the first few parts of the IP address or similar...

Example:

([\r\n]+)\d+\.\d+\.\d+\.\d+|([\r\n]+)[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}

Perhaps?
If this works I'll convert it to an answer...

briancronrath
Contributor

Thanks gareth, feel free to convert to answer and I will mark it as solved!

gjanders
SplunkTrust
SplunkTrust
0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...