Splunk Search

rex not finding the end of a string and the same rex works in other applications

_jgpm_
Communicator

I'm on 6.4.3. I'm trying to template a text parser in Splunk that will basically delimit sentences in many different use cases. If there is a better way of doing this, please let me know.

As far as I can tell, this is a rex issue specific to Splunk. I use regex101.com to proof my regex before using them in Splunk. This works almost 100% of the time. Here is one of the edge cases that I can't figure out.

This is the _raw:
Wi-Fi delivery to cars remains a major target application for European car makers, in spite of the regulatory challenges in Europe.
Xxxxx Xxxxxx was demonstrating its solution already available in 400,000 Xxxx vehicles shipped in Europe. Among other applications:

This is the rex expression:
| rex field=_raw "(\x{F0B7} )?(?P<encode>[A-Z].+?[.])[\x22”]?[ ](?P<decode>[A-Z].+)" |

This is encode:
Wi-Fi delivery to cars remains a major target application for European car makers, in spite of the regulatory challenges in Europe.

This is decode:
Xxxxx Xxxxxx was demonstrating its solution already available in 400,000 Xxxx vehicles shipped in Europe.

I can't get the last Among other applications: to appear in decode. I've tried adding $, replacing the .+ with explicit characters. Almost all attempts result in encode capturing the whole _raw and decode being null.

I don't want to just drop the last bit of text, I want to capture 'em all. Can someone help me out with the regex before I pull out my hair?

Thanks.

0 Karma
1 Solution

koshyk
Super Champion

Please have a try below regex

rex field=_raw "(\x{F0B7} )?(?P<encode>[A-Z].+?[.])[\x22”]?[ ](?P<decode>[\w\W]+)"

Example with complete value

 | makeresults | eval key="Wi-Fi delivery to cars remains a major target application for European car makers, in spite of the regulatory challenges in Europe. Xxxxx Xxxxxx was demonstrating its solution already available in 400,000 Xxxx vehicles shipped in Europe. Among other applications:" |  rex field=key "(\x{F0B7} )?(?P<encode>[A-Z].+?[.])[\x22”]?[ ](?P<decode>[\w\W]+)" | table encode, decode

View solution in original post

woodcock
Esteemed Legend

I suspect the problem is embedded newlines or unexpected extra whitespace so try this:

| rex "(?ms)(\x{F0B7}\s+)?(?<encode>[A-Z][^\.]*\.)[\x22”]?\s+(?<decode>[A-Z].+)"

_jgpm_
Communicator

worked as well. I reduced it to this

(?ms)(\x{F0B7})?(?P<encode2>[A-Z][^\.]*\.)[\x22”]? (?P<decode2>[A-Z].+)

which worked. Bonus points for showing me how to use inline regex flags within the expression.

0 Karma

koshyk
Super Champion

Please have a try below regex

rex field=_raw "(\x{F0B7} )?(?P<encode>[A-Z].+?[.])[\x22”]?[ ](?P<decode>[\w\W]+)"

Example with complete value

 | makeresults | eval key="Wi-Fi delivery to cars remains a major target application for European car makers, in spite of the regulatory challenges in Europe. Xxxxx Xxxxxx was demonstrating its solution already available in 400,000 Xxxx vehicles shipped in Europe. Among other applications:" |  rex field=key "(\x{F0B7} )?(?P<encode>[A-Z].+?[.])[\x22”]?[ ](?P<decode>[\w\W]+)" | table encode, decode

_jgpm_
Communicator

worked with the fewest changes.

0 Karma
Get Updates on the Splunk Community!

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...

Adoption of Infrastructure Monitoring at Splunk

  Splunk's Growth Engineering team showcases one of their first Splunk product adoption-Splunk Infrastructure ...