Splunk Search

rex not finding the end of a string and the same rex works in other applications

_jgpm_
Communicator

I'm on 6.4.3. I'm trying to template a text parser in Splunk that will basically delimit sentences in many different use cases. If there is a better way of doing this, please let me know.

As far as I can tell, this is a rex issue specific to Splunk. I use regex101.com to proof my regex before using them in Splunk. This works almost 100% of the time. Here is one of the edge cases that I can't figure out.

This is the _raw:
Wi-Fi delivery to cars remains a major target application for European car makers, in spite of the regulatory challenges in Europe.
Xxxxx Xxxxxx was demonstrating its solution already available in 400,000 Xxxx vehicles shipped in Europe. Among other applications:

This is the rex expression:
| rex field=_raw "(\x{F0B7} )?(?P<encode>[A-Z].+?[.])[\x22”]?[ ](?P<decode>[A-Z].+)" |

This is encode:
Wi-Fi delivery to cars remains a major target application for European car makers, in spite of the regulatory challenges in Europe.

This is decode:
Xxxxx Xxxxxx was demonstrating its solution already available in 400,000 Xxxx vehicles shipped in Europe.

I can't get the last Among other applications: to appear in decode. I've tried adding $, replacing the .+ with explicit characters. Almost all attempts result in encode capturing the whole _raw and decode being null.

I don't want to just drop the last bit of text, I want to capture 'em all. Can someone help me out with the regex before I pull out my hair?

Thanks.

0 Karma
1 Solution

koshyk
Super Champion

Please have a try below regex

rex field=_raw "(\x{F0B7} )?(?P<encode>[A-Z].+?[.])[\x22”]?[ ](?P<decode>[\w\W]+)"

Example with complete value

 | makeresults | eval key="Wi-Fi delivery to cars remains a major target application for European car makers, in spite of the regulatory challenges in Europe. Xxxxx Xxxxxx was demonstrating its solution already available in 400,000 Xxxx vehicles shipped in Europe. Among other applications:" |  rex field=key "(\x{F0B7} )?(?P<encode>[A-Z].+?[.])[\x22”]?[ ](?P<decode>[\w\W]+)" | table encode, decode

View solution in original post

woodcock
Esteemed Legend

I suspect the problem is embedded newlines or unexpected extra whitespace so try this:

| rex "(?ms)(\x{F0B7}\s+)?(?<encode>[A-Z][^\.]*\.)[\x22”]?\s+(?<decode>[A-Z].+)"

_jgpm_
Communicator

worked as well. I reduced it to this

(?ms)(\x{F0B7})?(?P<encode2>[A-Z][^\.]*\.)[\x22”]? (?P<decode2>[A-Z].+)

which worked. Bonus points for showing me how to use inline regex flags within the expression.

0 Karma

koshyk
Super Champion

Please have a try below regex

rex field=_raw "(\x{F0B7} )?(?P<encode>[A-Z].+?[.])[\x22”]?[ ](?P<decode>[\w\W]+)"

Example with complete value

 | makeresults | eval key="Wi-Fi delivery to cars remains a major target application for European car makers, in spite of the regulatory challenges in Europe. Xxxxx Xxxxxx was demonstrating its solution already available in 400,000 Xxxx vehicles shipped in Europe. Among other applications:" |  rex field=key "(\x{F0B7} )?(?P<encode>[A-Z].+?[.])[\x22”]?[ ](?P<decode>[\w\W]+)" | table encode, decode

_jgpm_
Communicator

worked with the fewest changes.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...