Splunk Search

How to extract the OS and Browser from a user agent string in our logs without a Splunk app or external script?

julianj
Explorer

Hello Experts,

I need help in determining the OS and Browser's that appear in our logs. I understand the easiest thing to do is to use the app from Splunkbase that does exactly this (i believe its called TA-ua parser), or use an external script (I've seen a lot of answers direct to an external python script from github), but unfortunately I do not have enough access rights to incorporate these incredibly useful tools, so please do not offer links to these types resources.

I know it will be a nasty regular expression, if a regular expression could even handle it. If you have an idea on one that might work please let me know. However, I am wondering if there is potentially another way to get around this. Perhaps there is someway to simplify the UA string, just enough to at least gather the OS and/or the browser used (preferably browser if this technique would only allow one to be determined). I'm wondering if maybe I look at the problem from a less-Splunk-specific standpoint and a more just general decomposition of UA strings maybe I will be able to come up with a Splunk-specific solution.

Any help or guidance to a potential solution will be much appreciated. Thank you!

Sample logs:
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.107 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/600.3.18 (KHTML, like Gecko) Version/8.0.3 Safari/600.3.18
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.132 Safari/537.36

0 Karma

landen99
Motivator

Ideally, these regex strings should be matched against the user agent field:

\((?:KHTML, [\w ]+\)|compatible;|Android [\d\.]+; [\w \.;:]+\)) +(?<browser>[\w ]+)[\/ ]+(?<browser_ver>[\w\.+]+)
\((?:compatible;[^;]+; |Linux; )?(?<os>[a-zA-Z]+(?: NT)?) ?(?<os_ver>[\d\.]+)
0 Karma

rsennett_splunk
Splunk Employee
Splunk Employee

What I would do (and have done) is use a list like this: http://www.useragentstring.com/pages/All/

To get a handle on all the possibilities and then use an something like this to categorize first
...|eval blah = if(match(useragent,"Windows"),"Windows", if(match(useragent, "X11"),"Linux", if(match(useragent, "Macintosh"),"MAC", "OTHER"))) etc

That way you don't have to do any crazy positioning because it's basically a keyword search. You can make that eval as long as you like... (watch the number of ending closing parens as you go) and make your regex more granular (the second parameter in match() is a regex.

useragent formats are kind of wild, wild west so you want to be able to see what ends up in "other" and add to your list as you go...

There is really no clean way to deal with these things especially when you start adding mobile os stuff...

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

skoelpin
SplunkTrust
SplunkTrust

Excellent suggestion. I'm doing a project related to this and this also solves a secondary problem I have... When users interact with the dashboard, they won't have to know what Windows NT 6.1 is, they can just see Windows 7

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Do 2 separate field extractions.. One for your browser and the other for the OS..

(?P<OSextraction>(?<=Mozilla\/\d\.\d).*(?=))

answers.splunk.com is not allowing the full regular expression to be posted for some reason.. Put < > before and after OSextraction

I haven't tested this but it should work.. If not then send a few more lines of sample data and I'll fix it up

0 Karma

sk314
Builder

post some sample logs or sample extracted user agent strings and I'm sure one of us could help you with the regex.

0 Karma

julianj
Explorer

Posted, thanks for the help. The main struggle I have with a regex is that there are many different type of UA's of varying length and structure... im wondering if theres an easier way to break them down. I just need OS and/or Browser, so if there are blatantly indicative qualities I'd like to leverage them but i've spent a lot of time trying to do that and can't quite get it, especially if I am interested in version (a plus)

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I think I see multiple browser strings in some of the logs. Which do you want?

---
If this reply helps you, Karma would be appreciated.
0 Karma

julianj
Explorer

These are the UA's i'm given, there are even a few other different ones that pop up on rare occasions, but I need a way to determine the browser and Os from all of them

0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...