Splunk Search

Regex question extracting user from webserver log

mikelanghorst
Motivator

For this sample data:
172.21.174.78 - "/dc=com/dc=caiso/OU=people/CN=Bob User" [11/May/2012:11:27:40 -0700] "POST /APP/ClientWebService HTTP/1.0" 200 439 "-" "Mozilla/3.0 (compatible; Indy Library)"
172.21.174.78 - mlanghor [11/May/2012:11:27:40 -0700] "POST /APP/ClientWebService HTTP/1.0" 200 439 "-" "Mozilla/3.0 (compatible; Indy Library)"
172.21.174.78 - - [11/May/2012:11:27:40 -0700] "POST /APP/ClientWebService HTTP/1.0" 200 439 "-" "Mozilla/3.0 (compatible; Indy Library)"

For some of our webserver logs, we are logging the DN from the user certificate with %{SSL_CLIENT_S_DN}x.

The default extraction for user is [[nspaces:user], so essentially (?[^\s]+).

In trying to extract the different variations for the user field I came up with:

(?<user>([^\"\s]+|\"[^\"]+\"))
But that includes the " as part of the field. I'm haven't been able to come up with a regex that"
when the first character is a " grab everything but not including the "'s, otherwise, grab everything till the next space.

Tags (1)

danielschroeder
Engager

You need to work with lookbehinds.

(?<user>(?<=\")[^\"]+|(?<!\")[^\s\"]+)

0 Karma

kristian_kolb
Ultra Champion

Would this work? Unescape the double quotes if needed.

^\S+\s+\S+\s+\"?(?<user>(?:([^\"]+)\"\s|([\S]+)\s+))

UPDATE:

Played around a little more with RegExr, and this looks good in there anyway (capture group 1 is OK).

^\S+\s+\S+\s+\"?(?<user>(?:(([^\"]+))|([\S]+)\s+))(?:\"\s\[|\s\[)

Wondering if it works,

/Kristian

0 Karma

mikelanghorst
Motivator

Seems closer, but it's retaining the closing quote.

0 Karma

mikelanghorst
Motivator

Finally got one working as I want:

(?:\"(?[^\"]+)\"|(?[^\s]+))

Or not, RegExr and Expresso works ok with this, but Splunk Rex command fails due to multiple blocks.

mikelanghorst
Motivator

while regexr accepts it just fine, passing this to rex fails with:
Error in 'rex' command: Encountered the following error while compiling the regex '(?:(?:"(?[^"]+)")|(?[^\s]+))': Regex: two named subpatterns have the same name

0 Karma
Get Updates on the Splunk Community!

Dashboard Studio Challenge - Learn New Tricks, Showcase Your Skills, and Win Prizes!

Reimagine what you can do with your dashboards. Dashboard Studio is Splunk’s newest dashboard builder to ...

Introducing Edge Processor: Next Gen Data Transformation

We get it - not only can it take a lot of time, money and resources to get data into Splunk, but it also takes ...

Take the 2021 Splunk Career Survey for $50 in Amazon Cash

Help us learn about how Splunk has impacted your career by taking the 2021 Splunk Career Survey. Last year’s ...