Splunk Search

Regex search: How to make sure the next character is NOT a backslash?

isxtn
Explorer

I am trying to dig through some records and trying to get the q (query) from the raw data, but I keep getting data back that includes a backslash after the requested field (mostly as a unicode character representation, /u0026 which is an &).

For example, I have this search query to capture the page from which a search is being made (i.e., "location"): 

 

index="xxxx-data" | regex query="location=([a-zA-Z0-9_]+)+[^&]+" | rex field=_raw "location=(?<location>[a-zA-Z0-9%-]+).*" | rex field=_raw "q=(?<q>[a-zA-Z0-9%-_&+/]+).*"| table location,q

 

Which mostly works viewing the Statistics tab, except that it occasionally returns the next URL parameter, i.e.,

locationq
home_page  hello+world   // this is ok
about_pagegoodbye+cruel+world\u0026anotherparam=anotherval    // not ok

 The second result should just be goodbye+cruel+world without the following parameter.

I have tried adding variations on regex NOT [^\\] for a backslash character but everything I've tried has either resulted in an error of the final bracket being escaped, or the backslash character ignored like so:

rex field=_raw  ...

regex attemptresult
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\\]).*" goodbye+cruel+world\u0026param=val  
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\]).*" 
Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*': Regex: missing terminating ] for character class.
 
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*" 
Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*': Regex: missing terminating ] for character class.
 
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\u0026]).*"
Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\u0026]).*': Regex: PCRE does not support \L, \l, \N{name}, \U, or \u.
 
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^u0026]).*" goodbye+cruel+world\u0026param=val"
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^&]).*" goodbye+cruel+world\u0026param=val"
"q=(?<q>[a-zA-Z0-9%-_&+/]+).*"goodbye+cruel+world\u0026param=val  

 

Events tab data is like: 

 

Event

apple: honeycrisp
ball: baseball
car: Ferrari
query: param1=val1&param2=val2&param3=val3&q=goodbye+cruel+world&param=val
status: 200

 

... etc ...

SO, how can I get the q value to return just the first parameter, ignoring anything that has a \ or & before it and terminating just at q?

And please, if you would be so kind, include an explanation of why what you suggest works? 

Thanks

Labels (2)
0 Karma

danspav
SplunkTrust
SplunkTrust

Hi @isxtn,

There's probably going to be a few ways to tackle this - here's one that may work for you:

| rex field=_raw "q=(?<q>.+?)(&|\\\u\d)"

That breaks down like this:

Create a field called "q" that uses up all characters until it sees either:

  • an & or
  • the literal string "\u" followed by a number

This should match when things are correctly separated by an ampersand, but also if the ampersand is character encoded.

The question mark after the .+ in the regex tells Splunk to not use greedy matching, so it will stop looking at the first "&" or "\u" that it sees.

To avoid the "Regex: PCRE does not support \L, \l, \N{name}, \U, or \u" error, I've escaped both the backslash and the u character.

Here's a test search to show  it in action:

| makeresults
| eval raw = "apple: honeycrisp
ball: baseball
car: Ferrari
query: param1=val1&param2=val2&param3=val3&q=goodbye+cruel+world\\u0026param=val
status: 200@apple: honeycrisp
ball: baseball
car: Ferrari
query: param1=val1&param2=val2&param3=val3&q=goodbye+cruel+world&param=val
status: 200"
| makemv raw delim="@" | mvexpand raw
| rename raw as _raw
| rex field=_raw "q=(?<q>.+?)(&|\\\u\d)"
| table _raw, q

That results in:

danspav_0-1691131179367.png


Cheers,
Daniel

Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...