Hi, Is there a easy and straight forward way of extracting browser versions from access logs using Useragent string.
I've a requirement where I have to list out top browsers and top versions of the browser. I was able to manage to extract the browser using the below eval expression but getting the browser versions are tricky.
| eval browser = case(match(useragent,"Firefox"),"FireFox", match(useragent,"Chrome") AND NOT match(useragent,"Edge"),"Chrome", match(useragent,"Safari") AND NOT match(useragent,"Chrome"),"Safari", match(useragent, "MSIE|Trident|Edge"), "IE", NOT match(useragent, "Chrome|Firefox|Safari|MSIE|Trident|Edge"), "OTHERS")
Has someone done that before and help me steer into right direction. I can't install any app so it has to be done via some regex. Please let me know if someone can help. Very much appreciated in advance
Some examples of Useragent Strings -
Mozilla/5.0 (Linux; Android 9; ANE-LX1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Mobile Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.3 Safari/605.1.15
Mozilla/5.0 (Linux; Android 9; ANE-LX1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Mobile Safari/537.36
Mozilla/5.0 (iPhone; CPU iPhone OS 15_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) GSA/198.0.425262635 Mobile/15E148 Safari/604.1
Best Regards,
Shashank
There is a requirement in HTTP 1.1 RFC describing the User-Agent format
https://datatracker.ietf.org/doc/html/rfc7231#section-5.5.3
A user agent SHOULD send a User-Agent field in each request unless specifically configured not to do so. User-Agent = product *( RWS ( product / comment ) ) The User-Agent field-value consists of one or more product identifiers, each followed by zero or more comments (Section 3.2 of [RFC7230]), which together identify the user agent software and its significant subproducts. By convention, the product identifiers are listed in decreasing order of their significance for identifying the user agent software.
Theoretically, a "product" and "version" here should be a "token", which means it should NOT include whitespace. And "comment" is any string contained within parentheses.
You have to remember though that it's "just" an RFC and User-Agent is a user-side supplied value so you can have anything in there but you might probably classify all those outliers as "other".
There is no universal standard adopted by all browsers for the format of the user agent string, so any set of regex to extract this is likely to be incomplete at best.