Getting Data In

How to handle LINE_BREAKER regex for multiple capture groups? Specifically now that we are getting both ip4 and ip6 logs?

Contributor

In the past we had an easy LINE_BREAKER regex that broke on newlines where an ip4 was present ([\r\n]+)\d+.\d+.\d+.\d+

Now we have some logs with ip6 in addition to ip4 being logged, so I was hoping I can just do this via piping it out to alternate capture groups depending on which ip it matches:

([\r\n]+)(\d+.\d+.\d+.\d+|(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]).){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]).){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])))

Is there something present where splunk only expects one capture group to be here for the LINE_BREAKER regex? I'm wondering how we can handle linebreakers now that we have 2 different style of IP that can come in.

0 Karma
1 Solution

SplunkTrust
SplunkTrust

Reading the LINE_BREAKER documentation I'm wondering if it's something to do with the parentheses around the regex match after the ([\r\n]+)

As per the props.conf documentation it says:

Example 1:  LINE_BREAKER = end(\n)begin|end2(\n)begin2|begin3

  * A line ending with 'end' followed a line beginning with 'begin' would
    match the first branch, and the first capturing group would have a match
    according to rule 1.  That particular newline would become a break
    between lines.

So I'm assuming you probably don't want to have the various (), also you could probably simplify it to match part of the IP address, unless you often have lines that look similar, normally I would match the first few parts of the IP address or similar...

Example:

([\r\n]+)\d+\.\d+\.\d+\.\d+|([\r\n]+)[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}

Perhaps?
If this works I'll convert it to an answer...

View solution in original post

SplunkTrust
SplunkTrust

Reading the LINE_BREAKER documentation I'm wondering if it's something to do with the parentheses around the regex match after the ([\r\n]+)

As per the props.conf documentation it says:

Example 1:  LINE_BREAKER = end(\n)begin|end2(\n)begin2|begin3

  * A line ending with 'end' followed a line beginning with 'begin' would
    match the first branch, and the first capturing group would have a match
    according to rule 1.  That particular newline would become a break
    between lines.

So I'm assuming you probably don't want to have the various (), also you could probably simplify it to match part of the IP address, unless you often have lines that look similar, normally I would match the first few parts of the IP address or similar...

Example:

([\r\n]+)\d+\.\d+\.\d+\.\d+|([\r\n]+)[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}

Perhaps?
If this works I'll convert it to an answer...

View solution in original post

Contributor

Thanks gareth, feel free to convert to answer and I will mark it as solved!

SplunkTrust
SplunkTrust

Done, thanks!

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!