Splunk Search

Regex question extracting user from webserver log

mikelanghorst
Motivator

For this sample data:
172.21.174.78 - "/dc=com/dc=caiso/OU=people/CN=Bob User" [11/May/2012:11:27:40 -0700] "POST /APP/ClientWebService HTTP/1.0" 200 439 "-" "Mozilla/3.0 (compatible; Indy Library)"
172.21.174.78 - mlanghor [11/May/2012:11:27:40 -0700] "POST /APP/ClientWebService HTTP/1.0" 200 439 "-" "Mozilla/3.0 (compatible; Indy Library)"
172.21.174.78 - - [11/May/2012:11:27:40 -0700] "POST /APP/ClientWebService HTTP/1.0" 200 439 "-" "Mozilla/3.0 (compatible; Indy Library)"

For some of our webserver logs, we are logging the DN from the user certificate with %{SSL_CLIENT_S_DN}x.

The default extraction for user is [[nspaces:user], so essentially (?[^\s]+).

In trying to extract the different variations for the user field I came up with:

(?<user>([^\"\s]+|\"[^\"]+\"))
But that includes the " as part of the field. I'm haven't been able to come up with a regex that"
when the first character is a " grab everything but not including the "'s, otherwise, grab everything till the next space.

Tags (1)

danielschroeder
Engager

You need to work with lookbehinds.

(?<user>(?<=\")[^\"]+|(?<!\")[^\s\"]+)

0 Karma

kristian_kolb
Ultra Champion

Would this work? Unescape the double quotes if needed.

^\S+\s+\S+\s+\"?(?<user>(?:([^\"]+)\"\s|([\S]+)\s+))

UPDATE:

Played around a little more with RegExr, and this looks good in there anyway (capture group 1 is OK).

^\S+\s+\S+\s+\"?(?<user>(?:(([^\"]+))|([\S]+)\s+))(?:\"\s\[|\s\[)

Wondering if it works,

/Kristian

0 Karma

mikelanghorst
Motivator

Seems closer, but it's retaining the closing quote.

0 Karma

mikelanghorst
Motivator

Finally got one working as I want:

(?:\"(?[^\"]+)\"|(?[^\s]+))

Or not, RegExr and Expresso works ok with this, but Splunk Rex command fails due to multiple blocks.

mikelanghorst
Motivator

while regexr accepts it just fine, passing this to rex fails with:
Error in 'rex' command: Encountered the following error while compiling the regex '(?:(?:"(?[^"]+)")|(?[^\s]+))': Regex: two named subpatterns have the same name

0 Karma
Get Updates on the Splunk Community!

Investigate Security and Threat Detection with VirusTotal and Splunk Integration

As security threats and their complexities surge, security analysts deal with increased challenges and ...

Observability Highlights | January 2023 Newsletter

 January 2023New Product Releases Splunk Network Explorer for Infrastructure MonitoringSplunk unveils Network ...

Security Highlights | January 2023 Newsletter

January 2023 Splunk Security Essentials (SSE) 3.7.0 ReleaseThe free Splunk Security Essentials (SSE) 3.7.0 app ...