Splunk Search

Regex search: How to make sure the next character is NOT a backslash?

isxtn
Explorer

I am trying to dig through some records and trying to get the q (query) from the raw data, but I keep getting data back that includes a backslash after the requested field (mostly as a unicode character representation, /u0026 which is an &).

For example, I have this search query to capture the page from which a search is being made (i.e., "location"): 

 

index="xxxx-data" | regex query="location=([a-zA-Z0-9_]+)+[^&]+" | rex field=_raw "location=(?<location>[a-zA-Z0-9%-]+).*" | rex field=_raw "q=(?<q>[a-zA-Z0-9%-_&+/]+).*"| table location,q

 

Which mostly works viewing the Statistics tab, except that it occasionally returns the next URL parameter, i.e.,

locationq
home_page  hello+world   // this is ok
about_pagegoodbye+cruel+world\u0026anotherparam=anotherval    // not ok

 The second result should just be goodbye+cruel+world without the following parameter.

I have tried adding variations on regex NOT [^\\] for a backslash character but everything I've tried has either resulted in an error of the final bracket being escaped, or the backslash character ignored like so:

rex field=_raw  ...

regex attemptresult
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\\]).*" goodbye+cruel+world\u0026param=val  
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\]).*" 
Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*': Regex: missing terminating ] for character class.
 
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*" 
Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*': Regex: missing terminating ] for character class.
 
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\u0026]).*"
Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\u0026]).*': Regex: PCRE does not support \L, \l, \N{name}, \U, or \u.
 
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^u0026]).*" goodbye+cruel+world\u0026param=val"
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^&]).*" goodbye+cruel+world\u0026param=val"
"q=(?<q>[a-zA-Z0-9%-_&+/]+).*"goodbye+cruel+world\u0026param=val  

 

Events tab data is like: 

 

Event

apple: honeycrisp
ball: baseball
car: Ferrari
query: param1=val1&param2=val2&param3=val3&q=goodbye+cruel+world&param=val
status: 200

 

... etc ...

SO, how can I get the q value to return just the first parameter, ignoring anything that has a \ or & before it and terminating just at q?

And please, if you would be so kind, include an explanation of why what you suggest works? 

Thanks

Labels (2)
0 Karma

danspav
SplunkTrust
SplunkTrust

Hi @isxtn,

There's probably going to be a few ways to tackle this - here's one that may work for you:

| rex field=_raw "q=(?<q>.+?)(&|\\\u\d)"

That breaks down like this:

Create a field called "q" that uses up all characters until it sees either:

  • an & or
  • the literal string "\u" followed by a number

This should match when things are correctly separated by an ampersand, but also if the ampersand is character encoded.

The question mark after the .+ in the regex tells Splunk to not use greedy matching, so it will stop looking at the first "&" or "\u" that it sees.

To avoid the "Regex: PCRE does not support \L, \l, \N{name}, \U, or \u" error, I've escaped both the backslash and the u character.

Here's a test search to show  it in action:

| makeresults
| eval raw = "apple: honeycrisp
ball: baseball
car: Ferrari
query: param1=val1&param2=val2&param3=val3&q=goodbye+cruel+world\\u0026param=val
status: 200@apple: honeycrisp
ball: baseball
car: Ferrari
query: param1=val1&param2=val2&param3=val3&q=goodbye+cruel+world&param=val
status: 200"
| makemv raw delim="@" | mvexpand raw
| rename raw as _raw
| rex field=_raw "q=(?<q>.+?)(&|\\\u\d)"
| table _raw, q

That results in:

danspav_0-1691131179367.png


Cheers,
Daniel

Get Updates on the Splunk Community!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer at Splunk .conf24 ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...