I am trying to dig through some records and trying to get the q (query) from the raw data, but I keep getting data back that includes a backslash after the requested field (mostly as a unicode character representation, /u0026 which is an &).
For example, I have this search query to capture the page from which a search is being made (i.e., "location"):
index="xxxx-data" | regex query="location=([a-zA-Z0-9_]+)+[^&]+" | rex field=_raw "location=(?<location>[a-zA-Z0-9%-]+).*" | rex field=_raw "q=(?<q>[a-zA-Z0-9%-_&+/]+).*"| table location,q
Which mostly works viewing the Statistics tab, except that it occasionally returns the next URL parameter, i.e.,
location | q |
home_page | hello+world // this is ok |
about_page | goodbye+cruel+world\u0026anotherparam=anotherval // not ok |
The second result should just be goodbye+cruel+world without the following parameter.
I have tried adding variations on regex NOT [^\\] for a backslash character but everything I've tried has either resulted in an error of the final bracket being escaped, or the backslash character ignored like so:
rex field=_raw ...
regex attempt | result |
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\\]).*" | goodbye+cruel+world\u0026param=val |
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\]).*" | Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*': Regex: missing terminating ] for character class. |
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*" | Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\]).*': Regex: missing terminating ] for character class. |
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^\\u0026]).*" | Error in 'rex' command: Encountered the following error while compiling the regex 'q=(?<q>[a-zA-Z0-9%-_&+/]+[^\u0026]).*': Regex: PCRE does not support \L, \l, \N{name}, \U, or \u. |
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^u0026]).*" | goodbye+cruel+world\u0026param=val" |
"q=(?<q>[a-zA-Z0-9%-_&+/]+[^&]).*" | goodbye+cruel+world\u0026param=val" |
"q=(?<q>[a-zA-Z0-9%-_&+/]+).*" | goodbye+cruel+world\u0026param=val |
"q=(?<q>[a-zA-Z0-9%-_&+/^\\\\]+)" | goodbye+cruel+world\u0026param=val |
Events tab data is like:
Event
apple: honeycrisp
ball: baseball
car: Ferrari
query: param1=val1¶m2=val2¶m3=val3&q=goodbye+cruel+world¶m=val
status: 200
... etc ...
SO, how can I get the q value to return just the first parameter, ignoring anything that has a \ or & before it and terminating just at q?
And please, if you would be so kind, include an explanation of why what you suggest works?
Thanks
Not entirely sure whether your _raw field includes an & or \u0026, anyway for backslashes you have to escape the escape
| rex "q=(?<q>[^\\\\]+)"
Thank you. The raw data is an actual "&" not the unicode. Even when I take out the "&" from the regex I still get the unicode reply.
And when I changed the regex to
rex field=_raw "q=(?<q>[a-zA-Z0-9%-_&+/^\\\\]+)"
I still get the \u0026param=val ...
Is there another pattern I should use?
| rex "q=(?<q>[^&]+)"