Knowledge Management

How to write a regular expression for extracting User Agent details ?

jaibalaraman
Path Finder

Hi 

I tried rex extracting user agent details, however due to my lack of knowledge in Splunk finding difficultly. From the below rex command output i managed to extract ( OS, Version ).

I tried the below rex and its working fine however i dont know how to capture more details like mentioned in the below tabular column. 

1 - \((?P<os>[^;]+);(?P<vers>[^;)]+).*$ 

2 - | rex "\(.*(?<OS>Android\s\d+|OS \d+_\d+|Windows NT\s\d+\.\d+)\;?.*\)"
| fillnull value="unrecognised" OS

3 - rex "\((?P<osinfo>[^\)]+)\)" | rex field=osinfo "(?P<os>[^;]+);(?P<vers>[^;]+)(;(?P<etc>[^;]+))?" | stats count by os, vers

I would like to extract them as below format would that be possible ?

Mobile Device Software name Software version Layout EngineOS System OS OS version 
A10 - SM-A105GChrome 86.0.42.40.185BlinkAndroid 10Android10
I phoneSafari14WebkitIOS 14.1IOS14.1
DesktopChrome 8686.0.4240.111BlinkWindows 10Windows10

 

UserAgent has different format for iOS & Android and Desktop as we can see below,

Android user Mozilla/5.0 (LinuxAndroid 10SAMSUNG SMT590AppleWebKit/537.36 (KHTMLlike GeckoSamsungBrowser / 12.1 Chrome/79.0.3945.136 Safari/537.36

 

Iphone user Mozilla/5.0 (iPhoneCPU iPhone OS 14_1 like Mac OS XAppleWebKit/605.1.15 (KHTMLlike GeckoVersion/14.0 Mobile/15E148 Safari/604.1

Desktop user Mozilla/5.0 (Windows NT 10.0Win64x64AppleWebKit/537.36 (KHTMLlike GeckoChrome/86.0.4240.111 Safari/537.36

HP device

Mozilla/5.0 (LinuxAndroid 5.1.1HP Pro Slate 12 Build/LMY47VwvAppleWebKit/537.36 (KHTMLlike GeckoVersion/4.0 Chrome/68.0.3440.91 Safari/537.36

Could anyone please assist me writing  a regular expression  which satisfy the tabular column.

 

Thanks 

Labels (1)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @jaibalaraman,

You have to use a regex for each cind of log and some transformations using eval to display results in the format you want when it's different from the logs (e.g. iOS 14.1 in the logs is OS 14_1).

In addition the "Layout Engine" field isn't present in the logs.

so try this regexes:

iPhone

 

| rex "\((?<mobile_device>\w+);\s+\w+\s+\w+\s+(?<os>\w+)\s+(?<os_version>\w+).*Version\/(?<software_version>[^ ]+)\s+\w+\/\w+\s+(?<software_name>\w+)\/\d+\.\d+$"
| replace "OS" with "iOS" in os
| replace "*_*" with "*.*" in os_version
| eval os_system=os." ".os_version

 

you can test regex at https://regex101.com/r/KCegdc/2

Android

there isn't also the device information

 

| rex "\(\w+;\s+(?<os>\w+)\s+(?<os_version>\w+);.*SamsungBrowser\s+\/\s+\d+\.\d+\s+(?<software_name>[^\/]+)\/(?<software_version>[^ ]+)"
| eval os_system=os." ".os_version

 

you can test regex at https://regex101.com/r/poQV2h/1

Desktop

 

| rex "\((?<os>\w+)\s+\w+\s+(?<os_version>[^;]+)[^\)]+\)[^\)]+\)\s+(?<software_name>[^\/]+)\/(?<software_version>[^ ]+)"
| eval os_system=os." ".os_version

 

 you can test regex at https://regex101.com/r/chALlI/1

Ciao.

Giuseppe

0 Karma

jaibalaraman
Path Finder

Hi 

i tried its not working, could you please help me fixing this issue.

 

thanks

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @jaibalaraman,

could you better describe "not working"?

no results? wrong results? an error message? what's about?

Did you tried one regex or two regexes?

Ciao.

Giuseppe

0 Karma

jaibalaraman
Path Finder
Spoiler
 

Hi  @gcusello 

Thank you so much for  your valuable input. Yes its working however for android & Iphone doesn't return field. 

For desktop its working fine example ( img1 & Img2 ) however img3 explains Android & Iphone selected filed not extracting could you help me on this please.

Thanks 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You asked a very similar question again last week here 

You may need to escape the hyphens and the slashes. You should try your rex at regex101.com - you can copy all the user agent lines in and see how well your rex works against them all. You may want to try breaking up the string into parts and using other rex on just parts e.g.

| rex "(?<firstpart>[^\(]+)\((?<secondpart>[^\)]+)\)(?<thirdpart>[^\(]+)\((?<fourthpart>[^\)]+)\)(?<fifthpart>.*)"
| rex field=secondpart "(?<OS>Android|Windows|OS)"
| rex field=fifthpath "(?<browser>Safari|Chrome)"

etc, Note that not all user agent strings follow this pattern so you still may get some that fall through, but you can find those and extend your rex to cover them all eventually (until a manufacturer brings out a new phone or OS that you hadn't accounted for!). This is an ongoing activity and you might want to question the value you are getting from knowing this information! 

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...