Splunk Search

Why does my regular expression ignore escaped double quotes in value?

lumpymilk
Explorer

When extracting the request or cookie from httpd logs I'm having problems capturing an entire request when the request contains an escaped double quote. The reason appears to be in the handling of this sequence \" by Splunk.

For example if the request field of the log contains this data ...

"http://www.mydomain.com/request.pl?clientData=someVar:\"this is the important data\""

Then a regular expression for \"(?[^\"]*?)\" will capture http://www.mydomain.com/request.pl?clientData=someVar:\

If I try \"(?(?:(\x5c\x22|[^\"]))*?)\" then the search fails with an error saying "Please check log"... no details.

If I try \"(?(?:(\x5c\x21|[^\"]))*?)\" then the search completes with no error. Too bad \x21 isn't what I'm looking for.

If I try \"(?(?:(\x5c.|[^\"]))*?)\" in the hopes that ANY character preceded by a backslash will match then I get an error again.

The simple question is how would one capture data between double quotes where the data may contain escaped double quotes?

0 Karma

lumpymilk
Explorer

Can someone explain how to handle the \" characters in a capture group when my field boundaries are double quotes? That's what I really need. It seems like splunk is having a problem when I escape the backslash and double quotes in my regex. Other regex tools are able to handle things like \"(?(\\"|[^\"])?)\" or \"(?(?:(\\"|[^\"]))?)\" just fine... but splunk errors on it.

0 Karma

mpreddy
Communicator

try some this like this,,

| stats c | eval _raw="2015-03-27T15:49:34 http://www.mydomain.com/request.pl?field2=value2&field1=value1&field4=value4&clientData=someVar:\"th... is the important data\"&field3=value3 data2"  |rex "^[^\?\n]*\?(?P<url_parameter>.*) "  | rex max_match=10 field=url_parameter "(?<url_parameter_field>\w+)=" | rex max_match=10 field=url_parameter "=(?<url_parameter_value>[0-9a-zA-Z\:\\\"\ ]*)" | fields - c
0 Karma

richgalloway
SplunkTrust
SplunkTrust

The regex "(?<url>.*)" works on regex101.com.

---
If this reply helps you, Karma would be appreciated.
0 Karma

lumpymilk
Explorer

Let me clarify a little. It is in fact a little more complicated than I originally stated.

The data is in w3c format. "(?.*)" would match but with the data looking like this ...

"data" "data" "data" data data data "http://www.mydomain.com/request.pl?clientData=someVar:\"this is the important data\"" "other data" "more data"

\"(?[^\"]*?)\"\s\"(?[^\"]*?)\"\s\"(?[^\"]*?)\"\s(?\S*?)\s(?\S*?)\s(?\S*?)\s\"(?.*)\"

matches more than the request data.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If you just want the URL then "(?<url>http.*)" will match it.

If you're trying to match all of the fields, then you have a trickier problem because no single delimiter separates the fields. Space won't work because of embedded spaces and some fields aren't quoted.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...