Splunk Search

rex not finding the end of a string and the same rex works in other applications

_jgpm_
Communicator

I'm on 6.4.3. I'm trying to template a text parser in Splunk that will basically delimit sentences in many different use cases. If there is a better way of doing this, please let me know.

As far as I can tell, this is a rex issue specific to Splunk. I use regex101.com to proof my regex before using them in Splunk. This works almost 100% of the time. Here is one of the edge cases that I can't figure out.

This is the _raw:
Wi-Fi delivery to cars remains a major target application for European car makers, in spite of the regulatory challenges in Europe.
Xxxxx Xxxxxx was demonstrating its solution already available in 400,000 Xxxx vehicles shipped in Europe. Among other applications:

This is the rex expression:
| rex field=_raw "(\x{F0B7} )?(?P<encode>[A-Z].+?[.])[\x22”]?[ ](?P<decode>[A-Z].+)" |

This is encode:
Wi-Fi delivery to cars remains a major target application for European car makers, in spite of the regulatory challenges in Europe.

This is decode:
Xxxxx Xxxxxx was demonstrating its solution already available in 400,000 Xxxx vehicles shipped in Europe.

I can't get the last Among other applications: to appear in decode. I've tried adding $, replacing the .+ with explicit characters. Almost all attempts result in encode capturing the whole _raw and decode being null.

I don't want to just drop the last bit of text, I want to capture 'em all. Can someone help me out with the regex before I pull out my hair?

Thanks.

0 Karma
1 Solution

koshyk
Super Champion

Please have a try below regex

rex field=_raw "(\x{F0B7} )?(?P<encode>[A-Z].+?[.])[\x22”]?[ ](?P<decode>[\w\W]+)"

Example with complete value

 | makeresults | eval key="Wi-Fi delivery to cars remains a major target application for European car makers, in spite of the regulatory challenges in Europe. Xxxxx Xxxxxx was demonstrating its solution already available in 400,000 Xxxx vehicles shipped in Europe. Among other applications:" |  rex field=key "(\x{F0B7} )?(?P<encode>[A-Z].+?[.])[\x22”]?[ ](?P<decode>[\w\W]+)" | table encode, decode

View solution in original post

woodcock
Esteemed Legend

I suspect the problem is embedded newlines or unexpected extra whitespace so try this:

| rex "(?ms)(\x{F0B7}\s+)?(?<encode>[A-Z][^\.]*\.)[\x22”]?\s+(?<decode>[A-Z].+)"

_jgpm_
Communicator

worked as well. I reduced it to this

(?ms)(\x{F0B7})?(?P<encode2>[A-Z][^\.]*\.)[\x22”]? (?P<decode2>[A-Z].+)

which worked. Bonus points for showing me how to use inline regex flags within the expression.

0 Karma

koshyk
Super Champion

Please have a try below regex

rex field=_raw "(\x{F0B7} )?(?P<encode>[A-Z].+?[.])[\x22”]?[ ](?P<decode>[\w\W]+)"

Example with complete value

 | makeresults | eval key="Wi-Fi delivery to cars remains a major target application for European car makers, in spite of the regulatory challenges in Europe. Xxxxx Xxxxxx was demonstrating its solution already available in 400,000 Xxxx vehicles shipped in Europe. Among other applications:" |  rex field=key "(\x{F0B7} )?(?P<encode>[A-Z].+?[.])[\x22”]?[ ](?P<decode>[\w\W]+)" | table encode, decode

_jgpm_
Communicator

worked with the fewest changes.

0 Karma
Get Updates on the Splunk Community!

Register to Attend BSides SPL 2022 - It's all Happening October 18!

Join like-minded individuals for technical sessions on everything Splunk!  This is a community-led and run ...

What's New in Splunk Cloud Platform 9.0.2208?!

Howdy!  We are happy to share the newest updates in Splunk Cloud Platform 9.0.2208! Analysts can benefit ...

Admin Console: A Single, Unified Interface for All Your Cloud Admin Needs

WATCH NOWJoin us to learn how the admin console can save you time and give you more control over the Splunk® ...