Getting Data In

How to handle LINE_BREAKER regex for multiple capture groups? Specifically now that we are getting both ip4 and ip6 logs?

briancronrath
Contributor

In the past we had an easy LINE_BREAKER regex that broke on newlines where an ip4 was present ([\r\n]+)\d+.\d+.\d+.\d+

Now we have some logs with ip6 in addition to ip4 being logged, so I was hoping I can just do this via piping it out to alternate capture groups depending on which ip it matches:

([\r\n]+)(\d+.\d+.\d+.\d+|(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]).){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]).){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])))

Is there something present where splunk only expects one capture group to be here for the LINE_BREAKER regex? I'm wondering how we can handle linebreakers now that we have 2 different style of IP that can come in.

0 Karma
1 Solution

gjanders
SplunkTrust
SplunkTrust

Reading the LINE_BREAKER documentation I'm wondering if it's something to do with the parentheses around the regex match after the ([\r\n]+)

As per the props.conf documentation it says:

Example 1:  LINE_BREAKER = end(\n)begin|end2(\n)begin2|begin3

  * A line ending with 'end' followed a line beginning with 'begin' would
    match the first branch, and the first capturing group would have a match
    according to rule 1.  That particular newline would become a break
    between lines.

So I'm assuming you probably don't want to have the various (), also you could probably simplify it to match part of the IP address, unless you often have lines that look similar, normally I would match the first few parts of the IP address or similar...

Example:

([\r\n]+)\d+\.\d+\.\d+\.\d+|([\r\n]+)[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}

Perhaps?
If this works I'll convert it to an answer...

View solution in original post

gjanders
SplunkTrust
SplunkTrust

Reading the LINE_BREAKER documentation I'm wondering if it's something to do with the parentheses around the regex match after the ([\r\n]+)

As per the props.conf documentation it says:

Example 1:  LINE_BREAKER = end(\n)begin|end2(\n)begin2|begin3

  * A line ending with 'end' followed a line beginning with 'begin' would
    match the first branch, and the first capturing group would have a match
    according to rule 1.  That particular newline would become a break
    between lines.

So I'm assuming you probably don't want to have the various (), also you could probably simplify it to match part of the IP address, unless you often have lines that look similar, normally I would match the first few parts of the IP address or similar...

Example:

([\r\n]+)\d+\.\d+\.\d+\.\d+|([\r\n]+)[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}

Perhaps?
If this works I'll convert it to an answer...

briancronrath
Contributor

Thanks gareth, feel free to convert to answer and I will mark it as solved!

gjanders
SplunkTrust
SplunkTrust
0 Karma
Get Updates on the Splunk Community!

Improve Your Security Posture

Watch NowImprove Your Security PostureCustomers are at the center of everything we do at Splunk and security ...

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...