Splunk Search

User Agent regex

dewald13
Path Finder

I'm trying to create a regex to match the user agent from the following logs. Beginning with "Mozilla/*" and ending at the end of the UA string. The problem I'm having is that one is so much longer than the other one I cant seem to match them both from " to ". I know these are a pain in the @$$ to deal with but was curious if anyone had any suggestions/insight.

2013-02-21 22:39:29 26 xxx 200 TCP_ACCELERATED 39 373 CONNECT tcp ssl.gstatic.com 443 / - - - - 132.x.xx.134 - "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; InfoPath.2; .NET4.0C; .NET4.0E)" OBSERVED "Search Engines/Portals" - 1xx.xx.xx.xxx SG-HTTP-Service

2013-02-21 22:39:29 26 xxx 200 TCP_ACCELERATED 39 373 CONNECT tcp ssl.gstatic.com 443 / - - - - 132.x.xx.134 - "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.79 Safari/537.1" - xxx.x.xxx.xxx SG-HTTP-Service

Tags (3)
0 Karma
1 Solution

dewald13
Path Finder

After troubleshooting with the creator of the TA-browscap app, Dave Shpritz, for two days we finally got it figured out. The final regex ended up being; \s\"(?P<http_user_agent>[^"]+)

The TA-browscap app supplies a lot of very useful information for the pesky user agent strings.

Big thanks to Dave on this one!

View solution in original post

0 Karma

monkeymole
Explorer

I have been looking for this answer for a while now. The regex posted above; \s\"(?P[^"]+)
does not return the user agent in apache syslog files, instead I added \" to the beginning, this way it will match the close quotes, space character and finally an open quotes, before picking up field.
FINAL REGEX: \"\s\"(?P<http_user_agent>[^"]+)
Hope it helps,
monkeymole

edit: added the final regex to the answer.

0 Karma

dewald13
Path Finder

After troubleshooting with the creator of the TA-browscap app, Dave Shpritz, for two days we finally got it figured out. The final regex ended up being; \s\"(?P<http_user_agent>[^"]+)

The TA-browscap app supplies a lot of very useful information for the pesky user agent strings.

Big thanks to Dave on this one!

0 Karma

jgedeon120
Contributor

Something like Mozilla[^"]*

http://regexr.com?33s70

dewald13
Path Finder

They are from BlueCoats in ELFF format.

0 Karma

jgedeon120
Contributor

I just looked at your logs again. I guess they are not access combined logs. If I get some time today I will try to come up with the extractions for the whole log message. What are these from if you don't mind me asking?

I guess they are.

0 Karma

jgedeon120
Contributor

Good point. Are you not used the default sourcetype for access_combined logs? It should already have this.


REGEX = ^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[nspaces:bytes]?[[all:other]]

For some reason it is dropping the back slashes before the s'es.

0 Karma

dewald13
Path Finder

That worked perfect using a regex generator online but when I put that into Splunk as a field extraction it does not match anything. Any ideas???

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...