Splunk Search

Regex Search: How can I make sure that the next character is not a backslash?

isxtn
Explorer

I am trying to dig through some records and trying to get the q (query) from the raw data, but I keep getting data back that includes a backslash after the requested field (mostly as a unicode character representation, /u0026 which is an &).

For example, I have this search query to capture the page from which a search is being made (i.e., "location"): 

 

index="xxxx-data" | regex query="location=([a-zA-Z0-9_]+)+[^&]+" | rex field=_raw "location=(?<location>[a-zA-Z0-9%-]+).*" | rex field=_raw "q=(?<q>[a-zA-Z0-9%-_&+/]+).*"| table location,q

 

Which mostly works viewing the Statistics tab, except that it occasionally returns the next URL parameter, i.e.,

locationq
home_page  hello+world   // this is ok
about_pagegoodbye+cruel+world\u0026anotherparam=anotherval    // not ok

 The second result should just be goodbye+cruel+world without the following parameter.

I have tried adding variations on regex NOT [^\\] for a backslash character but everything I've tried has either resulted in an error of the final bracket being escaped, or the backslash character ignored like so:

rex field=_raw  ...

regex attemptresult
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\\]).*" goodbye+cruel+world\u0026param=val  
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\]).*" 
Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*': Regex: missing terminating ] for character class.
 
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*" 
Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*': Regex: missing terminating ] for character class.
 
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\u0026]).*"
Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\u0026]).*': Regex: PCRE does not support \L, \l, \N{name}, \U, or \u.
 
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^u0026]).*" goodbye+cruel+world\u0026param=val"
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^&]).*" goodbye+cruel+world\u0026param=val"
"q=(?<q>[a-zA-Z0-9%-_&+/]+).*"goodbye+cruel+world\u0026param=val  
"q=(?<q>[a-zA-Z0-9%-_&+/^\\\\]+)" goodbye+cruel+world\u0026param=val  

Events tab data is like: 

 

Event

apple: honeycrisp
ball: baseball
car: Ferrari
query: param1=val1&param2=val2&param3=val3&q=goodbye+cruel+world&param=val
status: 200

 

... etc ...

SO, how can I get the q value to return just the first parameter, ignoring anything that has a \ or & before it and terminating just at q?

And please, if you would be so kind, include an explanation of why what you suggest works? 

Thanks

Labels (3)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Not entirely sure whether your _raw field includes an & or \u0026, anyway for backslashes you have to escape the escape

| rex "q=(?<q>[^\\\\]+)"

isxtn
Explorer

Thank you. The raw data is an actual "&" not the unicode.  Even when I take out the "&" from the regex I still get the unicode reply. 

And when I changed the regex to 

 

rex field=_raw "q=(?<q>[a-zA-Z0-9%-_&+/^\\\\]+)"

 

I still get the \u0026param=val ... 

Is there another pattern I should use? 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| rex "q=(?<q>[^&]+)"
0 Karma
Get Updates on the Splunk Community!

Registration for Splunk University is Now Open!

Are you ready for an adventure in learning?   Brace yourselves because Splunk University is back, and it's ...

Splunkbase | Splunk Dashboard Examples App for SimpleXML End of Life

The Splunk Dashboard Examples App for SimpleXML will reach end of support on Dec 19, 2024, after which no new ...

Understanding Generative AI Techniques and Their Application in Cybersecurity

Watch On-Demand Artificial intelligence is the talk of the town nowadays, with industries of all kinds ...