All:
I am trying to chart browsers used by my app based on the "useragent" field from access_combined (apache logs) in this manner.
sourcetype="access_combined" useragent!="-" AND useragent!="Apache" AND useragent!="Load-weight" AND useragent!="Java" AND useragent!="Jakarta Commons-HttpClient" | stats count(eval(match(useragent, "Firefox"))) as "Firefox", count(eval(match(useragent, "Chrome"))) as "Chrome", count(eval(match(useragent, "Safari"))) as "Safari", count(eval(match(useragent, "MSIE"))) as "IE", count(eval(NOT match(useragent, "Chrome|Firefox|Safari|MSIE"))) as "Other"
The problem is that the actual log entry looks like this:
For firefox:
97.76.108.114 - - [11/Jul/2012:08:36:37 -0700] "POST /forgotPassword HTTP/1.1" 200 3799 "https://www.easycareonline.com/forgotPassword" "Mozilla/5.0 (Windows NT 5.1; rv:13.0) Gecko/20100101 Firefox/13.0.1"
For IE:
97.76.108.114 - - [11/Jul/2012:08:36:37 -0700] "POST /forgotPassword HTTP/1.1" 200 3799 "https://www.easycareonline.com/forgotPassword" "MSIE 8.0; Windows NT 5.2; Trident/4.0"
The useragent entry files these two under OTHER because the ACTUAL VALUE for useragent is :
"Mozilla/5.0 (Windows NT 5.1; rv:13.0) Gecko/20100101 Firefox/13.0.1"
AND
"MSIE 8.0; Windows NT 5.2; Trident/4.0"
Any ideas on how to go about this ? Maybe regexes ?
The problem with using field extractions is that there is no set standard for what a UA (User Agent) string should look like, at all. I wonder what the chrome entry looks like (obviously we have none yet)
Because there is such a large number of useragents, it may be worth while using a lookup file to determine the browser, periodically refreshing it when you're getting too many "misses".
After a bit of a look I stumbled upon this site: http://browsers.garykeith.com/downloads, which has a comprehensive list of user-agents and (with a bit of vi trickery) could easily be converted into a lookup that allows you to determine a users browser make & version from the user agent in the event.
Hope this helps 🙂