Splunk Search

Regex Search: How can I make sure that the next character is not a backslash?

isxtn
Explorer

I am trying to dig through some records and trying to get the q (query) from the raw data, but I keep getting data back that includes a backslash after the requested field (mostly as a unicode character representation, /u0026 which is an &).

For example, I have this search query to capture the page from which a search is being made (i.e., "location"): 

 

index="xxxx-data" | regex query="location=([a-zA-Z0-9_]+)+[^&]+" | rex field=_raw "location=(?<location>[a-zA-Z0-9%-]+).*" | rex field=_raw "q=(?<q>[a-zA-Z0-9%-_&+/]+).*"| table location,q

 

Which mostly works viewing the Statistics tab, except that it occasionally returns the next URL parameter, i.e.,

locationq
home_page  hello+world   // this is ok
about_pagegoodbye+cruel+world\u0026anotherparam=anotherval    // not ok

 The second result should just be goodbye+cruel+world without the following parameter.

I have tried adding variations on regex NOT [^\\] for a backslash character but everything I've tried has either resulted in an error of the final bracket being escaped, or the backslash character ignored like so:

rex field=_raw  ...

regex attemptresult
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\\]).*" goodbye+cruel+world\u0026param=val  
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\]).*" 
Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*': Regex: missing terminating ] for character class.
 
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*" 
Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*': Regex: missing terminating ] for character class.
 
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\u0026]).*"
Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\u0026]).*': Regex: PCRE does not support \L, \l, \N{name}, \U, or \u.
 
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^u0026]).*" goodbye+cruel+world\u0026param=val"
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^&]).*" goodbye+cruel+world\u0026param=val"
"q=(?<q>[a-zA-Z0-9%-_&+/]+).*"goodbye+cruel+world\u0026param=val  
"q=(?<q>[a-zA-Z0-9%-_&+/^\\\\]+)" goodbye+cruel+world\u0026param=val  

Events tab data is like: 

 

Event

apple: honeycrisp
ball: baseball
car: Ferrari
query: param1=val1&param2=val2&param3=val3&q=goodbye+cruel+world&param=val
status: 200

 

... etc ...

SO, how can I get the q value to return just the first parameter, ignoring anything that has a \ or & before it and terminating just at q?

And please, if you would be so kind, include an explanation of why what you suggest works? 

Thanks

Labels (3)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Not entirely sure whether your _raw field includes an & or \u0026, anyway for backslashes you have to escape the escape

| rex "q=(?<q>[^\\\\]+)"

isxtn
Explorer

Thank you. The raw data is an actual "&" not the unicode.  Even when I take out the "&" from the regex I still get the unicode reply. 

And when I changed the regex to 

 

rex field=_raw "q=(?<q>[a-zA-Z0-9%-_&+/^\\\\]+)"

 

I still get the \u0026param=val ... 

Is there another pattern I should use? 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| rex "q=(?<q>[^&]+)"
0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Get the T-shirt to Prove You Survived Splunk University Bootcamp

As if Splunk University, in Las Vegas, in-person, with three days of bootcamps and labs weren’t enough, now ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...