Splunk Enterprise

How to extract using Browser Version from useragent

shashank_24
Path Finder

Hi, Is there a easy and straight forward way of extracting browser versions from access logs using Useragent string.

I've a requirement where I have to list out top browsers and top versions of the browser. I was able to manage to extract the browser using the below eval expression  but getting the browser versions are tricky.

 

 

 

| eval browser = case(match(useragent,"Firefox"),"FireFox", match(useragent,"Chrome") AND NOT match(useragent,"Edge"),"Chrome", match(useragent,"Safari") AND NOT match(useragent,"Chrome"),"Safari", match(useragent, "MSIE|Trident|Edge"), "IE", NOT match(useragent, "Chrome|Firefox|Safari|MSIE|Trident|Edge"), "OTHERS")

 

 

 

Has someone done that before and help me steer into right direction. I can't install any app so it has to be done via some regex. Please let me know if someone can help. Very much appreciated in advance

Some examples of Useragent Strings -

Mozilla/5.0 (Linux; Android 9; ANE-LX1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Mobile Safari/537.36

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.3 Safari/605.1.15

Mozilla/5.0 (Linux; Android 9; ANE-LX1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Mobile Safari/537.36

Mozilla/5.0 (iPhone; CPU iPhone OS 15_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) GSA/198.0.425262635 Mobile/15E148 Safari/604.1

Best Regards,

Shashank

0 Karma

PickleRick
SplunkTrust
SplunkTrust

There is a requirement in HTTP 1.1 RFC describing the User-Agent format

https://datatracker.ietf.org/doc/html/rfc7231#section-5.5.3

 


A user agent SHOULD send a User-Agent field in each request
   unless specifically configured not to do so.

     User-Agent = product *( RWS ( product / comment ) )

   The User-Agent field-value consists of one or more product
   identifiers, each followed by zero or more comments (Section 3.2 of
   [RFC7230]), which together identify the user agent software and its
   significant subproducts.  By convention, the product identifiers are
   listed in decreasing order of their significance for identifying the
   user agent software.


Theoretically, a "product" and "version" here should be a "token", which means it should NOT include whitespace. And "comment" is any string contained within parentheses.

You have to remember though that it's "just" an RFC and User-Agent is a user-side supplied value so you can have anything in there but you might probably classify all those outliers as "other".

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

There is no universal standard adopted by all browsers for the format of the user agent string, so any set of regex to extract this is likely to be incomplete at best.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...