Splunk Search

User Agent regex

dewald13
Path Finder

I'm trying to create a regex to match the user agent from the following logs. Beginning with "Mozilla/*" and ending at the end of the UA string. The problem I'm having is that one is so much longer than the other one I cant seem to match them both from " to ". I know these are a pain in the @$$ to deal with but was curious if anyone had any suggestions/insight.

2013-02-21 22:39:29 26 xxx 200 TCP_ACCELERATED 39 373 CONNECT tcp ssl.gstatic.com 443 / - - - - 132.x.xx.134 - "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; InfoPath.2; .NET4.0C; .NET4.0E)" OBSERVED "Search Engines/Portals" - 1xx.xx.xx.xxx SG-HTTP-Service

2013-02-21 22:39:29 26 xxx 200 TCP_ACCELERATED 39 373 CONNECT tcp ssl.gstatic.com 443 / - - - - 132.x.xx.134 - "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.79 Safari/537.1" - xxx.x.xxx.xxx SG-HTTP-Service

Tags (3)
0 Karma
1 Solution

dewald13
Path Finder

After troubleshooting with the creator of the TA-browscap app, Dave Shpritz, for two days we finally got it figured out. The final regex ended up being; \s\"(?P<http_user_agent>[^"]+)

The TA-browscap app supplies a lot of very useful information for the pesky user agent strings.

Big thanks to Dave on this one!

View solution in original post

0 Karma

monkeymole
Explorer

I have been looking for this answer for a while now. The regex posted above; \s\"(?P[^"]+)
does not return the user agent in apache syslog files, instead I added \" to the beginning, this way it will match the close quotes, space character and finally an open quotes, before picking up field.
FINAL REGEX: \"\s\"(?P<http_user_agent>[^"]+)
Hope it helps,
monkeymole

edit: added the final regex to the answer.

0 Karma

dewald13
Path Finder

After troubleshooting with the creator of the TA-browscap app, Dave Shpritz, for two days we finally got it figured out. The final regex ended up being; \s\"(?P<http_user_agent>[^"]+)

The TA-browscap app supplies a lot of very useful information for the pesky user agent strings.

Big thanks to Dave on this one!

0 Karma

jgedeon120
Contributor

Something like Mozilla[^"]*

http://regexr.com?33s70

dewald13
Path Finder

They are from BlueCoats in ELFF format.

0 Karma

jgedeon120
Contributor

I just looked at your logs again. I guess they are not access combined logs. If I get some time today I will try to come up with the extractions for the whole log message. What are these from if you don't mind me asking?

I guess they are.

0 Karma

jgedeon120
Contributor

Good point. Are you not used the default sourcetype for access_combined logs? It should already have this.


REGEX = ^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[nspaces:bytes]?[[all:other]]

For some reason it is dropping the back slashes before the s'es.

0 Karma

dewald13
Path Finder

That worked perfect using a regex generator online but when I put that into Splunk as a field extraction it does not match anything. Any ideas???

0 Karma
Get Updates on the Splunk Community!

Index This | Divide 100 by half. What do you get?

November 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Stay Connected: Your Guide to December Tech Talks, Office Hours, and Webinars!

❄️ Celebrate the season with our December lineup of Community Office Hours, Tech Talks, and Webinars! ...

Splunk and Fraud

Watch Now!Watch an insightful webinar where we delve into the innovative approaches to solving fraud using the ...