I'm trying to create a regex to match the user agent from the following logs. Beginning with "Mozilla/*" and ending at the end of the UA string. The problem I'm having is that one is so much longer than the other one I cant seem to match them both from " to ". I know these are a pain in the @$$ to deal with but was curious if anyone had any suggestions/insight.
2013-02-21 22:39:29 26 xxx 200 TCP_ACCELERATED 39 373 CONNECT tcp ssl.gstatic.com 443 / - - - - 132.x.xx.134 - "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; InfoPath.2; .NET4.0C; .NET4.0E)" OBSERVED "Search Engines/Portals" - 1xx.xx.xx.xxx SG-HTTP-Service
2013-02-21 22:39:29 26 xxx 200 TCP_ACCELERATED 39 373 CONNECT tcp ssl.gstatic.com 443 / - - - - 132.x.xx.134 - "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.79 Safari/537.1" - xxx.x.xxx.xxx SG-HTTP-Service
After troubleshooting with the creator of the TA-browscap app, Dave Shpritz, for two days we finally got it figured out. The final regex ended up being; \s\"(?P<http_user_agent>[^"]+)
The TA-browscap app supplies a lot of very useful information for the pesky user agent strings.
Big thanks to Dave on this one!
I have been looking for this answer for a while now. The regex posted above; \s\"(?P
does not return the user agent in apache syslog files, instead I added \" to the beginning, this way it will match the close quotes, space character and finally an open quotes, before picking up field.
FINAL REGEX: \"\s\"(?P<http_user_agent>[^"]+)
Hope it helps,
monkeymole
edit: added the final regex to the answer.
After troubleshooting with the creator of the TA-browscap app, Dave Shpritz, for two days we finally got it figured out. The final regex ended up being; \s\"(?P<http_user_agent>[^"]+)
The TA-browscap app supplies a lot of very useful information for the pesky user agent strings.
Big thanks to Dave on this one!
Something like Mozilla[^"]*
They are from BlueCoats in ELFF format.
I just looked at your logs again. I guess they are not access combined logs. If I get some time today I will try to come up with the extractions for the whole log message. What are these from if you don't mind me asking?
I guess they are.
Good point. Are you not used the default sourcetype for access_combined logs? It should already have this.
REGEX = ^[[nspaces:clientip]]\s++[[nspaces:ident]]\s++[[nspaces:user]]\s++[[sbstring:req_time]]\s++[[access-request]]\s++[[nspaces:status]]\s++[nspaces:bytes]?[[all:other]]
For some reason it is dropping the back slashes before the s'es.
That worked perfect using a regex generator online but when I put that into Splunk as a field extraction it does not match anything. Any ideas???