How to extract the OS and Browser from a user agen...

julianj · ‎07-29-2015

Hello Experts,

I need help in determining the OS and Browser's that appear in our logs. I understand the easiest thing to do is to use the app from Splunkbase that does exactly this (i believe its called TA-ua parser), or use an external script (I've seen a lot of answers direct to an external python script from github), but unfortunately I do not have enough access rights to incorporate these incredibly useful tools, so please do not offer links to these types resources.

I know it will be a nasty regular expression, if a regular expression could even handle it. If you have an idea on one that might work please let me know. However, I am wondering if there is potentially another way to get around this. Perhaps there is someway to simplify the UA string, just enough to at least gather the OS and/or the browser used (preferably browser if this technique would only allow one to be determined). I'm wondering if maybe I look at the problem from a less-Splunk-specific standpoint and a more just general decomposition of UA strings maybe I will be able to come up with a Splunk-specific solution.

Any help or guidance to a potential solution will be much appreciated. Thank you!

Sample logs:
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.107 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/8.0.3 Safari/600.3.18
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.132 Safari/537.36

landen99 · ‎11-01-2016

Ideally, these regex strings should be matched against the user agent field:

\((?:KHTML, [\w ]+\)|compatible;|Android [\d\.]+; [\w \.;:]+\)) +(?<browser>[\w ]+)[\/ ]+(?<browser_ver>[\w\.+]+)
\((?:compatible;[^;]+; |Linux; )?(?<os>[a-zA-Z]+(?: NT)?) ?(?<os_ver>[\d\.]+)

rsennett_splunk · ‎07-30-2015

What I would do (and have done) is use a list like this: http://www.useragentstring.com/pages/All/

To get a handle on all the possibilities and then use an something like this to categorize first
...|eval blah = if(match(useragent,"Windows"),"Windows", if(match(useragent, "X11"),"Linux", if(match(useragent, "Macintosh"),"MAC", "OTHER"))) etc

That way you don't have to do any crazy positioning because it's basically a keyword search. You can make that eval as long as you like... (watch the number of ending closing parens as you go) and make your regex more granular (the second parameter in match() is a regex.

useragent formats are kind of wild, wild west so you want to be able to see what ends up in "other" and add to your list as you go...

There is really no clean way to deal with these things especially when you start adding mobile os stuff...

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

skoelpin · ‎07-31-2015

Excellent suggestion. I'm doing a project related to this and this also solves a secondary problem I have... When users interact with the dashboard, they won't have to know what Windows NT 6.1 is, they can just see Windows 7

skoelpin · ‎07-30-2015

Do 2 separate field extractions.. One for your browser and the other for the OS..

(?P<OSextraction>(?<=Mozilla\/\d\.\d).*(?=))

answers.splunk.com is not allowing the full regular expression to be posted for some reason.. Put < > before and after OSextraction

I haven't tested this but it should work.. If not then send a few more lines of sample data and I'll fix it up

sk314 · ‎07-29-2015

post some sample logs or sample extracted user agent strings and I'm sure one of us could help you with the regex.

julianj · ‎07-29-2015

Posted, thanks for the help. The main struggle I have with a regex is that there are many different type of UA's of varying length and structure... im wondering if theres an easier way to break them down. I just need OS and/or Browser, so if there are blatantly indicative qualities I'd like to leverage them but i've spent a lot of time trying to do that and can't quite get it, especially if I am interested in version (a plus)

richgalloway · ‎07-29-2015

I think I see multiple browser strings in some of the logs. Which do you want?

---
If this reply helps you, Karma would be appreciated.

julianj · ‎07-30-2015

These are the UA's i'm given, there are even a few other different ones that pop up on rare occasions, but I need a way to determine the browser and Os from all of them

How to extract the OS and Browser from a user agent string in our logs without a Splunk app or external script?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Index This | What travels the world but is also stuck in place?

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

Join the Conversation