Splunk Search

Regex question extracting user from webserver log

mikelanghorst
Motivator

For this sample data:
172.21.174.78 - "/dc=com/dc=caiso/OU=people/CN=Bob User" [11/May/2012:11:27:40 -0700] "POST /APP/ClientWebService HTTP/1.0" 200 439 "-" "Mozilla/3.0 (compatible; Indy Library)"
172.21.174.78 - mlanghor [11/May/2012:11:27:40 -0700] "POST /APP/ClientWebService HTTP/1.0" 200 439 "-" "Mozilla/3.0 (compatible; Indy Library)"
172.21.174.78 - - [11/May/2012:11:27:40 -0700] "POST /APP/ClientWebService HTTP/1.0" 200 439 "-" "Mozilla/3.0 (compatible; Indy Library)"

For some of our webserver logs, we are logging the DN from the user certificate with %{SSL_CLIENT_S_DN}x.

The default extraction for user is [[nspaces:user], so essentially (?[^\s]+).

In trying to extract the different variations for the user field I came up with:

(?<user>([^\"\s]+|\"[^\"]+\"))
But that includes the " as part of the field. I'm haven't been able to come up with a regex that"
when the first character is a " grab everything but not including the "'s, otherwise, grab everything till the next space.

Tags (1)

danielschroeder
Engager

You need to work with lookbehinds.

(?<user>(?<=\")[^\"]+|(?<!\")[^\s\"]+)

0 Karma

kristian_kolb
Ultra Champion

Would this work? Unescape the double quotes if needed.

^\S+\s+\S+\s+\"?(?<user>(?:([^\"]+)\"\s|([\S]+)\s+))

UPDATE:

Played around a little more with RegExr, and this looks good in there anyway (capture group 1 is OK).

^\S+\s+\S+\s+\"?(?<user>(?:(([^\"]+))|([\S]+)\s+))(?:\"\s\[|\s\[)

Wondering if it works,

/Kristian

0 Karma

mikelanghorst
Motivator

Seems closer, but it's retaining the closing quote.

0 Karma

mikelanghorst
Motivator

Finally got one working as I want:

(?:\"(?[^\"]+)\"|(?[^\s]+))

Or not, RegExr and Expresso works ok with this, but Splunk Rex command fails due to multiple blocks.

mikelanghorst
Motivator

while regexr accepts it just fine, passing this to rex fails with:
Error in 'rex' command: Encountered the following error while compiling the regex '(?:(?:"(?[^"]+)")|(?[^\s]+))': Regex: two named subpatterns have the same name

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...