Splunk Search

User Agent regex

Path Finder

I'm trying to create a regex to match the user agent from the following logs. Beginning with "Mozilla/*" and ending at the end of the UA string. The problem I'm having is that one is so much longer than the other one I cant seem to match them both from " to ". I know these are a pain in the @$$ to deal with but was curious if anyone had any suggestions/insight.

2013-02-21 22:39:29 26 xxx 200 TCP_ACCELERATED 39 373 CONNECT tcp ssl.gstatic.com 443 / - - - - 132.x.xx.134 - "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; InfoPath.2; .NET4.0C; .NET4.0E)" OBSERVED "Search Engines/Portals" - 1xx.xx.xx.xxx SG-HTTP-Service

2013-02-21 22:39:29 26 xxx 200 TCP_ACCELERATED 39 373 CONNECT tcp ssl.gstatic.com 443 / - - - - 132.x.xx.134 - "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.79 Safari/537.1" - xxx.x.xxx.xxx SG-HTTP-Service

Tags (3)
0 Karma
1 Solution

Path Finder

After troubleshooting with the creator of the TA-browscap app, Dave Shpritz, for two days we finally got it figured out. The final regex ended up being; \s\"(?P<http_user_agent>[^"]+)

The TA-browscap app supplies a lot of very useful information for the pesky user agent strings.

Big thanks to Dave on this one!

View solution in original post

0 Karma

Explorer

I have been looking for this answer for a while now. The regex posted above; \s\"(?P[^"]+)
does not return the user agent in apache syslog files, instead I added \" to the beginning, this way it will match the close quotes, space character and finally an open quotes, before picking up field.
FINAL REGEX: \"\s\"(?P<http_user_agent>[^"]+)
Hope it helps,
monkeymole

edit: added the final regex to the answer.

0 Karma

Path Finder

After troubleshooting with the creator of the TA-browscap app, Dave Shpritz, for two days we finally got it figured out. The final regex ended up being; \s\"(?P<http_user_agent>[^"]+)

The TA-browscap app supplies a lot of very useful information for the pesky user agent strings.

Big thanks to Dave on this one!

View solution in original post

0 Karma

Contributor

Something like Mozilla[^"]*

http://regexr.com?33s70

Path Finder

They are from BlueCoats in ELFF format.

0 Karma

Contributor

I just looked at your logs again. I guess they are not access combined logs. If I get some time today I will try to come up with the extractions for the whole log message. What are these from if you don't mind me asking?

I guess they are.

0 Karma

Contributor

Good point. Are you not used the default sourcetype for access_combined logs? It should already have this.


REGEX = ^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[nspaces:bytes]?[[all:other]]

For some reason it is dropping the back slashes before the s'es.

0 Karma

Path Finder

That worked perfect using a regex generator online but when I put that into Splunk as a field extraction it does not match anything. Any ideas???

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!