Splunk Search

Regex Search: How can I make sure that the next character is not a backslash?

isxtn
Explorer

I am trying to dig through some records and trying to get the q (query) from the raw data, but I keep getting data back that includes a backslash after the requested field (mostly as a unicode character representation, /u0026 which is an &).

For example, I have this search query to capture the page from which a search is being made (i.e., "location"): 

 

index="xxxx-data" | regex query="location=([a-zA-Z0-9_]+)+[^&]+" | rex field=_raw "location=(?<location>[a-zA-Z0-9%-]+).*" | rex field=_raw "q=(?<q>[a-zA-Z0-9%-_&+/]+).*"| table location,q

 

Which mostly works viewing the Statistics tab, except that it occasionally returns the next URL parameter, i.e.,

locationq
home_page  hello+world   // this is ok
about_pagegoodbye+cruel+world\u0026anotherparam=anotherval    // not ok

 The second result should just be goodbye+cruel+world without the following parameter.

I have tried adding variations on regex NOT [^\\] for a backslash character but everything I've tried has either resulted in an error of the final bracket being escaped, or the backslash character ignored like so:

rex field=_raw  ...

regex attemptresult
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\\]).*" goodbye+cruel+world\u0026param=val  
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\]).*" 
Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*': Regex: missing terminating ] for character class.
 
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*" 
Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*': Regex: missing terminating ] for character class.
 
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\u0026]).*"
Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\u0026]).*': Regex: PCRE does not support \L, \l, \N{name}, \U, or \u.
 
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^u0026]).*" goodbye+cruel+world\u0026param=val"
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^&]).*" goodbye+cruel+world\u0026param=val"
"q=(?<q>[a-zA-Z0-9%-_&+/]+).*"goodbye+cruel+world\u0026param=val  
"q=(?<q>[a-zA-Z0-9%-_&+/^\\\\]+)" goodbye+cruel+world\u0026param=val  

Events tab data is like: 

 

Event

apple: honeycrisp
ball: baseball
car: Ferrari
query: param1=val1&param2=val2&param3=val3&q=goodbye+cruel+world&param=val
status: 200

 

... etc ...

SO, how can I get the q value to return just the first parameter, ignoring anything that has a \ or & before it and terminating just at q?

And please, if you would be so kind, include an explanation of why what you suggest works? 

Thanks

Labels (3)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Not entirely sure whether your _raw field includes an & or \u0026, anyway for backslashes you have to escape the escape

| rex "q=(?<q>[^\\\\]+)"

isxtn
Explorer

Thank you. The raw data is an actual "&" not the unicode.  Even when I take out the "&" from the regex I still get the unicode reply. 

And when I changed the regex to 

 

rex field=_raw "q=(?<q>[a-zA-Z0-9%-_&+/^\\\\]+)"

 

I still get the \u0026param=val ... 

Is there another pattern I should use? 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| rex "q=(?<q>[^&]+)"
0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

&#x1f342; Fall into November with a fresh lineup of Community Office Hours, Tech Talks, and Webinars we’ve ...

Transform your security operations with Splunk Enterprise Security

Hi Splunk Community, Splunk Platform has set a great foundation for your security operations. With the ...

Splunk Admins and App Developers | Earn a $35 gift card!

Splunk, in collaboration with ESG (Enterprise Strategy Group) by TechTarget, is excited to announce a ...