Splunk Search

Regex question extracting user from webserver log

mikelanghorst
Motivator

For this sample data:
172.21.174.78 - "/dc=com/dc=caiso/OU=people/CN=Bob User" [11/May/2012:11:27:40 -0700] "POST /APP/ClientWebService HTTP/1.0" 200 439 "-" "Mozilla/3.0 (compatible; Indy Library)"
172.21.174.78 - mlanghor [11/May/2012:11:27:40 -0700] "POST /APP/ClientWebService HTTP/1.0" 200 439 "-" "Mozilla/3.0 (compatible; Indy Library)"
172.21.174.78 - - [11/May/2012:11:27:40 -0700] "POST /APP/ClientWebService HTTP/1.0" 200 439 "-" "Mozilla/3.0 (compatible; Indy Library)"

For some of our webserver logs, we are logging the DN from the user certificate with %{SSL_CLIENT_S_DN}x.

The default extraction for user is [[nspaces:user], so essentially (?[^\s]+).

In trying to extract the different variations for the user field I came up with:

(?<user>([^\"\s]+|\"[^\"]+\"))
But that includes the " as part of the field. I'm haven't been able to come up with a regex that"
when the first character is a " grab everything but not including the "'s, otherwise, grab everything till the next space.

Tags (1)

danielschroeder
Engager

You need to work with lookbehinds.

(?<user>(?<=\")[^\"]+|(?<!\")[^\s\"]+)

0 Karma

kristian_kolb
Ultra Champion

Would this work? Unescape the double quotes if needed.

^\S+\s+\S+\s+\"?(?<user>(?:([^\"]+)\"\s|([\S]+)\s+))

UPDATE:

Played around a little more with RegExr, and this looks good in there anyway (capture group 1 is OK).

^\S+\s+\S+\s+\"?(?<user>(?:(([^\"]+))|([\S]+)\s+))(?:\"\s\[|\s\[)

Wondering if it works,

/Kristian

0 Karma

mikelanghorst
Motivator

Seems closer, but it's retaining the closing quote.

0 Karma

mikelanghorst
Motivator

Finally got one working as I want:

(?:\"(?[^\"]+)\"|(?[^\s]+))

Or not, RegExr and Expresso works ok with this, but Splunk Rex command fails due to multiple blocks.

mikelanghorst
Motivator

while regexr accepts it just fine, passing this to rex fails with:
Error in 'rex' command: Encountered the following error while compiling the regex '(?:(?:"(?[^"]+)")|(?[^\s]+))': Regex: two named subpatterns have the same name

0 Karma
Get Updates on the Splunk Community!

See just what you’ve been missing | Observability tracks at Splunk University

Looking to sharpen your observability skills so you can better understand how to collect and analyze data from ...

Weezer at .conf25? Say it ain’t so!

Hello Splunkers, The countdown to .conf25 is on-and we've just turned up the volume! We're thrilled to ...

How SC4S Makes Suricata Logs Ingestion Simple

Network security monitoring has become increasingly critical for organizations of all sizes. Splunk has ...