Splunk Search

REX: how to search in the opposite direction

LHumberto
Explorer

I'm new to REX and trying to extract strings from _raw (which is actually a malformed JSON, so SPATH is not a good option either).

I was able to create a REX to identify the pattern that I want (or kind of). However, I'm having trouble establishing the correct boundaries. There is where my lack of experience with REX is showing. I cannot establish the end of my pattern correctly. I have pasted the expression that I'm using and a cleaned-up sample of the text I'm dealing with.

| rex field=_raw "next\_best\_thing.+description(?<NBT>.+)topic"

I thought this would identify the beginning of my pattern as next_best_thing (as it does) and the end after the first description and capture the Group (NBT) as \\\":\\\"Another quick brown fox jumps over the lazy dog.\\\"},{\\\" (just before the first topic). Naturally, a lot of clean-up would still be necessary but I would have something to work with.

However, it seems that the search starts from the end of the _raw string, so the description that is being captured is in a different part and the Group becomes something completely different from what I intended to (\\\":\\\"A third quick brown fox jumps over the lazy dog\xAE Bla Bla BlaBla?\xA0 And a forth The quick brown fox jumps over the lazy dog.\\\"},{\\\").

Also, if the expression is just | rex field=_raw "next\_best\_thing.+description(?<NBT>.+)", omitting the end boundary (TOPIC), the whole pattern changes, with completely different description being used as the end boundary. And naturally the Group changes completely.

The latter reinforces the impressions that the searches are being performed from the end of _raw.

Is there a way to change the search direction? Or am I even more wrong / lost than I think on how to establish the boundaries for pattern and group?

"BlaBla_BlaBla_condition\\\":\\\"\\\",\\\"OtherBla\\\":{\\\"description\\\":\\\"The quick brown fox jumps over the lazy dog\\\",\\\"next_best_thing\\\":[{\\\"topic\\\":\\\"Target Public\\\",\\\"description\\\":\\\"Another quick brown fox jumps over the lazy dog.\\\"},{\\\"topic\\\":\\\"Benefit to Someone\\\",\\\"description\\\":\\\"A third quick brown fox jumps over the lazy dog\xAE Bla Bla BlaBla?\xA0 And a forth The quick brown fox jumps over the lazy dog.\\\"},{\\\"topic\\\":\\\"Call to Something\\\",\\\"description\\\":\\\"The fith quick brown fox jumps over the lazy dog.\\\"}]}},\\\"componentTemplate\\\":{\\\"id\\\":\\\"tcm:999-111111-99\\\",\\\"title\\\":\\\"BlaBlaBla_Bla_Bla\\\"},\\\"ia_rendered\\\":\\\"data-slot-id=\\\\\\\"BlaBlaBla\\\\\\\" lang=\\\\\\\"en\\\\\\\" data-offer-id=\\\\\\\"BLABLABLABLABLABLA\\\\\\\" \\\"}\",\"Rank\":\"1\"},\"categoryName\":\"\",\"source\":\"BLA\",\"name\":\"OTHETHINGSHERE_\",\"type\":null,\"placementName\":\"tvprimary\",\"presentationOrderWitinSlot\":1,\"productDetails\":{\"computerApplicationCode\":null,\"productCode\":\"BLA\",\"productSubCode\":\"\"},\"locationProductCode\":null,\"locationProductSubCode\":null,\"priorityWithInProductAndSubCode\":null}],\"error\":null},\"custSessionAvailable\":false},\"ecprFailed\":false,\"svtException\":null}"

Labels (1)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

I am not quite sure I follow which string you want to extract, but try something like this

| rex field=_raw "next\_best\_thing.+?description(?<NBT>.+?)}"

Note, the .+ that you used is greedy, by adding a question mark, .+? it reduces the amount used for the anchor after _thing until it reach the next description (not the last description as the greedy match does).

View solution in original post

LHumberto
Explorer

That worked perfectly!

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Even so, your code will be more robust and much more maintainable if you don't treat JSON data as text.  The mock data looks too much like an excerpt from compliant JSON, but part of the object contains embedded escaped JSON string, hence you want some special handling.

If you cab post complete mock data with the original structure, you will see that there is nothing that Splunk's QA tested spath command cannot handle.

0 Karma

LHumberto
Explorer

Hello, yuanliu.

Thank you for reaching out. While I agree that the excerpt that I posted is indeed JSON, the full _raw has much more text, and a lot of cleanup would be necessary before spath could be useful.

Considering my limited experience with SPLUNK at this point, it would be much more difficult to figure out what errors are caused by my shortcoming and what is caused by the need to prep _raw for spath to work its magic.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

I am not quite sure I follow which string you want to extract, but try something like this

| rex field=_raw "next\_best\_thing.+?description(?<NBT>.+?)}"

Note, the .+ that you used is greedy, by adding a question mark, .+? it reduces the amount used for the anchor after _thing until it reach the next description (not the last description as the greedy match does).

Get Updates on the Splunk Community!

What's New in Splunk Cloud Platform 9.2.2403?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2403! Analysts can ...

Stay Connected: Your Guide to July and August Tech Talks, Office Hours, and Webinars!

Dive into our sizzling summer lineup for July and August Community Office Hours and Tech Talks. Scroll down to ...

Edge Processor Scaling, Energy & Manufacturing Use Cases, and More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...