Splunk Search

Why does my regular expression ignore escaped double quotes in value?

lumpymilk
Explorer

When extracting the request or cookie from httpd logs I'm having problems capturing an entire request when the request contains an escaped double quote. The reason appears to be in the handling of this sequence \" by Splunk.

For example if the request field of the log contains this data ...

"http://www.mydomain.com/request.pl?clientData=someVar:\"this is the important data\""

Then a regular expression for \"(?[^\"]*?)\" will capture http://www.mydomain.com/request.pl?clientData=someVar:\

If I try \"(?(?:(\x5c\x22|[^\"]))*?)\" then the search fails with an error saying "Please check log"... no details.

If I try \"(?(?:(\x5c\x21|[^\"]))*?)\" then the search completes with no error. Too bad \x21 isn't what I'm looking for.

If I try \"(?(?:(\x5c.|[^\"]))*?)\" in the hopes that ANY character preceded by a backslash will match then I get an error again.

The simple question is how would one capture data between double quotes where the data may contain escaped double quotes?

0 Karma

lumpymilk
Explorer

Can someone explain how to handle the \" characters in a capture group when my field boundaries are double quotes? That's what I really need. It seems like splunk is having a problem when I escape the backslash and double quotes in my regex. Other regex tools are able to handle things like \"(?(\\"|[^\"])?)\" or \"(?(?:(\\"|[^\"]))?)\" just fine... but splunk errors on it.

0 Karma

mpreddy
Communicator

try some this like this,,

| stats c | eval _raw="2015-03-27T15:49:34 http://www.mydomain.com/request.pl?field2=value2&field1=value1&field4=value4&clientData=someVar:\"th... is the important data\"&field3=value3 data2"  |rex "^[^\?\n]*\?(?P<url_parameter>.*) "  | rex max_match=10 field=url_parameter "(?<url_parameter_field>\w+)=" | rex max_match=10 field=url_parameter "=(?<url_parameter_value>[0-9a-zA-Z\:\\\"\ ]*)" | fields - c
0 Karma

richgalloway
SplunkTrust
SplunkTrust

The regex "(?<url>.*)" works on regex101.com.

---
If this reply helps you, Karma would be appreciated.
0 Karma

lumpymilk
Explorer

Let me clarify a little. It is in fact a little more complicated than I originally stated.

The data is in w3c format. "(?.*)" would match but with the data looking like this ...

"data" "data" "data" data data data "http://www.mydomain.com/request.pl?clientData=someVar:\"this is the important data\"" "other data" "more data"

\"(?[^\"]*?)\"\s\"(?[^\"]*?)\"\s\"(?[^\"]*?)\"\s(?\S*?)\s(?\S*?)\s(?\S*?)\s\"(?.*)\"

matches more than the request data.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If you just want the URL then "(?<url>http.*)" will match it.

If you're trying to match all of the fields, then you have a trickier problem because no single delimiter separates the fields. Space won't work because of embedded spaces and some fields aren't quoted.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...

Thank You for Celebrating CX Day with Splunk!

Yesterday the entire team at Splunk &#43; Cisco joined the global celebration of CX Day - celebrating our ...