Splunk Search

Why does my regular expression ignore escaped double quotes in value?

lumpymilk
Explorer

When extracting the request or cookie from httpd logs I'm having problems capturing an entire request when the request contains an escaped double quote. The reason appears to be in the handling of this sequence \" by Splunk.

For example if the request field of the log contains this data ...

"http://www.mydomain.com/request.pl?clientData=someVar:\"this is the important data\""

Then a regular expression for \"(?[^\"]*?)\" will capture http://www.mydomain.com/request.pl?clientData=someVar:\

If I try \"(?(?:(\x5c\x22|[^\"]))*?)\" then the search fails with an error saying "Please check log"... no details.

If I try \"(?(?:(\x5c\x21|[^\"]))*?)\" then the search completes with no error. Too bad \x21 isn't what I'm looking for.

If I try \"(?(?:(\x5c.|[^\"]))*?)\" in the hopes that ANY character preceded by a backslash will match then I get an error again.

The simple question is how would one capture data between double quotes where the data may contain escaped double quotes?

0 Karma

lumpymilk
Explorer

Can someone explain how to handle the \" characters in a capture group when my field boundaries are double quotes? That's what I really need. It seems like splunk is having a problem when I escape the backslash and double quotes in my regex. Other regex tools are able to handle things like \"(?(\\"|[^\"])?)\" or \"(?(?:(\\"|[^\"]))?)\" just fine... but splunk errors on it.

0 Karma

mpreddy
Communicator

try some this like this,,

| stats c | eval _raw="2015-03-27T15:49:34 http://www.mydomain.com/request.pl?field2=value2&field1=value1&field4=value4&clientData=someVar:\"th... is the important data\"&field3=value3 data2"  |rex "^[^\?\n]*\?(?P<url_parameter>.*) "  | rex max_match=10 field=url_parameter "(?<url_parameter_field>\w+)=" | rex max_match=10 field=url_parameter "=(?<url_parameter_value>[0-9a-zA-Z\:\\\"\ ]*)" | fields - c
0 Karma

richgalloway
SplunkTrust
SplunkTrust

The regex "(?<url>.*)" works on regex101.com.

---
If this reply helps you, Karma would be appreciated.
0 Karma

lumpymilk
Explorer

Let me clarify a little. It is in fact a little more complicated than I originally stated.

The data is in w3c format. "(?.*)" would match but with the data looking like this ...

"data" "data" "data" data data data "http://www.mydomain.com/request.pl?clientData=someVar:\"this is the important data\"" "other data" "more data"

\"(?[^\"]*?)\"\s\"(?[^\"]*?)\"\s\"(?[^\"]*?)\"\s(?\S*?)\s(?\S*?)\s(?\S*?)\s\"(?.*)\"

matches more than the request data.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If you just want the URL then "(?<url>http.*)" will match it.

If you're trying to match all of the fields, then you have a trickier problem because no single delimiter separates the fields. Space won't work because of embedded spaces and some fields aren't quoted.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...

Network to App: Observability Unlocked [May & June Series]

In today’s digital landscape, your environment is no longer confined to the data center. It spans complex ...

SPL2 Deep Dives, AppDynamics Integrations, SAML Made Simple and Much More on Splunk ...

Splunk Lantern is Splunk’s customer success center that provides practical guidance from Splunk experts on key ...