I need help in determining the OS and Browser's that appear in our logs. I understand the easiest thing to do is to use the app from Splunkbase that does exactly this (i believe its called TA-ua parser), or use an external script (I've seen a lot of answers direct to an external python script from github), but unfortunately I do not have enough access rights to incorporate these incredibly useful tools, so please do not offer links to these types resources.
I know it will be a nasty regular expression, if a regular expression could even handle it. If you have an idea on one that might work please let me know. However, I am wondering if there is potentially another way to get around this. Perhaps there is someway to simplify the UA string, just enough to at least gather the OS and/or the browser used (preferably browser if this technique would only allow one to be determined). I'm wondering if maybe I look at the problem from a less-Splunk-specific standpoint and a more just general decomposition of UA strings maybe I will be able to come up with a Splunk-specific solution.
Any help or guidance to a potential solution will be much appreciated. Thank you!
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.107 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/8.0.3 Safari/600.3.18
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.132 Safari/537.36
Ideally, these regex strings should be matched against the user agent field:
\((?:KHTML, [\w ]+\)|compatible;|Android [\d\.]+; [\w \.;:]+\)) +(?<browser>[\w ]+)[\/ ]+(?<browser_ver>[\w\.+]+) \((?:compatible;[^;]+; |Linux; )?(?<os>[a-zA-Z]+(?: NT)?) ?(?<os_ver>[\d\.]+)
What I would do (and have done) is use a list like this: http://www.useragentstring.com/pages/All/
To get a handle on all the possibilities and then use an something like this to categorize first
...|eval blah = if(match(useragent,"Windows"),"Windows", if(match(useragent, "X11"),"Linux", if(match(useragent, "Macintosh"),"MAC", "OTHER"))) etc
That way you don't have to do any crazy positioning because it's basically a keyword search. You can make that eval as long as you like... (watch the number of ending closing parens as you go) and make your regex more granular (the second parameter in match() is a regex.
useragent formats are kind of wild, wild west so you want to be able to see what ends up in "other" and add to your list as you go...
There is really no clean way to deal with these things especially when you start adding mobile os stuff...
Excellent suggestion. I'm doing a project related to this and this also solves a secondary problem I have... When users interact with the dashboard, they won't have to know what Windows NT 6.1 is, they can just see Windows 7
Do 2 separate field extractions.. One for your browser and the other for the OS..
answers.splunk.com is not allowing the full regular expression to be posted for some reason.. Put < > before and after OSextraction
I haven't tested this but it should work.. If not then send a few more lines of sample data and I'll fix it up
Posted, thanks for the help. The main struggle I have with a regex is that there are many different type of UA's of varying length and structure... im wondering if theres an easier way to break them down. I just need OS and/or Browser, so if there are blatantly indicative qualities I'd like to leverage them but i've spent a lot of time trying to do that and can't quite get it, especially if I am interested in version (a plus)