Splunk Search

User Agent regex

dewald13
Path Finder

I'm trying to create a regex to match the user agent from the following logs. Beginning with "Mozilla/*" and ending at the end of the UA string. The problem I'm having is that one is so much longer than the other one I cant seem to match them both from " to ". I know these are a pain in the @$$ to deal with but was curious if anyone had any suggestions/insight.

2013-02-21 22:39:29 26 xxx 200 TCP_ACCELERATED 39 373 CONNECT tcp ssl.gstatic.com 443 / - - - - 132.x.xx.134 - "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; InfoPath.2; .NET4.0C; .NET4.0E)" OBSERVED "Search Engines/Portals" - 1xx.xx.xx.xxx SG-HTTP-Service

2013-02-21 22:39:29 26 xxx 200 TCP_ACCELERATED 39 373 CONNECT tcp ssl.gstatic.com 443 / - - - - 132.x.xx.134 - "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.79 Safari/537.1" - xxx.x.xxx.xxx SG-HTTP-Service

Tags (3)
0 Karma
1 Solution

dewald13
Path Finder

After troubleshooting with the creator of the TA-browscap app, Dave Shpritz, for two days we finally got it figured out. The final regex ended up being; \s\"(?P<http_user_agent>[^"]+)

The TA-browscap app supplies a lot of very useful information for the pesky user agent strings.

Big thanks to Dave on this one!

View solution in original post

0 Karma

monkeymole
Explorer

I have been looking for this answer for a while now. The regex posted above; \s\"(?P[^"]+)
does not return the user agent in apache syslog files, instead I added \" to the beginning, this way it will match the close quotes, space character and finally an open quotes, before picking up field.
FINAL REGEX: \"\s\"(?P<http_user_agent>[^"]+)
Hope it helps,
monkeymole

edit: added the final regex to the answer.

0 Karma

dewald13
Path Finder

After troubleshooting with the creator of the TA-browscap app, Dave Shpritz, for two days we finally got it figured out. The final regex ended up being; \s\"(?P<http_user_agent>[^"]+)

The TA-browscap app supplies a lot of very useful information for the pesky user agent strings.

Big thanks to Dave on this one!

View solution in original post

0 Karma

jgedeon120
Contributor

Something like Mozilla[^"]*

http://regexr.com?33s70

dewald13
Path Finder

They are from BlueCoats in ELFF format.

0 Karma

jgedeon120
Contributor

I just looked at your logs again. I guess they are not access combined logs. If I get some time today I will try to come up with the extractions for the whole log message. What are these from if you don't mind me asking?

I guess they are.

0 Karma

jgedeon120
Contributor

Good point. Are you not used the default sourcetype for access_combined logs? It should already have this.


REGEX = ^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[nspaces:bytes]?[[all:other]]

For some reason it is dropping the back slashes before the s'es.

0 Karma

dewald13
Path Finder

That worked perfect using a regex generator online but when I put that into Splunk as a field extraction it does not match anything. Any ideas???

0 Karma