I have a savedSearch like this which I want to put up on a dashboard in the form of a pie chart.
sourcetype="access_combined" host="us1-p01" | transaction clientip useragent | stats count as browserHits by useragent
We do not wish to use the web intelligence App !
My problem is that the aforementioned search gives me the results with a (barebones) useragent count only:
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8a) Gecko/20040416 ipMonitor/10.6"
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html"
I want to be able to display all statistics gathered (see below) in the form of a pie chart:
RECOGNISED BROWSERS
IE 7: 84 (19%)
Chrome: 49 (11%)
Safari: 8 (2%)
Firefox 5: 1 (0%)
Firefox 6: 0 (0%)
Firefox 7: 0 (0%)
Firefox 8: 0 (0%)
Unknown Android Browser: 3 (1%)
I know you can use python scripts , but I guess I am not entirely sure of the specifics of how to go about it ?
As I understand it, its a three step process
i) Write the results of the aforementioned search into a log file
ii) Write a python program ( https://github.com/tobie/ua-parser/blob/master/py/ua_parser/user_agent_parser.py ) to convert user-agents into browsers names that contain the VERSION+OS.
iii) Pipe these stats back to Splunk and put them results on a dashboard.
As it turns out this is much harder than it sounds.
I would consider using eventtypes to clean up the raw useragent data.
You can iteratively come up with a set of eventtypes that are non-overlapping, but it will take some work. Note that the webintelligence app might already have them so you might take a look at that app anyway. I know it has pretty nice eventtypes for picking up (and filtering out) bots.
But if you had some eventtypes like "browser-ie7", "browser-ie8", "firefox5", etc...
And you can prove that they're not overlapping by keeping an eye on searches like this:
sourcetype=access_combined | eval browsers=mvfilter(match(eventtype, "^browser-")) | eval matchCount=mvcount(browsers) | where matchCount!="1"
and ideally each event should match one and only one of the 'browser' eventtypes.
Then once you have reliable eventtypes,
sourcetype="access_combined" host="us1-ecojbs-p01" | transaction clientip useragent | eval browser=mvfilter(match(eventtype, "^browser-") | top 20 browser
Or if you prefer using stats
even when it's a bit more verbose,
sourcetype="access_combined" host="us1-ecojbs-p01" | transaction clientip useragent | eval browser=mvfilter(match(eventtype, "^browser-") | stats count as browserHits by browser | eventstats sum(count) as total | eval percent=browserHits/total | fields - total
I would consider using eventtypes to clean up the raw useragent data.
You can iteratively come up with a set of eventtypes that are non-overlapping, but it will take some work. Note that the webintelligence app might already have them so you might take a look at that app anyway. I know it has pretty nice eventtypes for picking up (and filtering out) bots.
But if you had some eventtypes like "browser-ie7", "browser-ie8", "firefox5", etc...
And you can prove that they're not overlapping by keeping an eye on searches like this:
sourcetype=access_combined | eval browsers=mvfilter(match(eventtype, "^browser-")) | eval matchCount=mvcount(browsers) | where matchCount!="1"
and ideally each event should match one and only one of the 'browser' eventtypes.
Then once you have reliable eventtypes,
sourcetype="access_combined" host="us1-ecojbs-p01" | transaction clientip useragent | eval browser=mvfilter(match(eventtype, "^browser-") | top 20 browser
Or if you prefer using stats
even when it's a bit more verbose,
sourcetype="access_combined" host="us1-ecojbs-p01" | transaction clientip useragent | eval browser=mvfilter(match(eventtype, "^browser-") | stats count as browserHits by browser | eventstats sum(count) as total | eval percent=browserHits/total | fields - total
Are you referring to field extractions ?
I am not sure I follow what an "eventtype" is.