Here's the situation:
I have one set of web log events that represent people using my app which I generally display in a timescaled graph split by the version they are using.
I have another set of web log events that represent crash reports, which I display the same way.
I'd like to calculate a "crash incidence / usage", which would require taking the count of crashes for a particular time bucket and dividing it by the count of usage events over the same time period.
Short of running extractions using the CLI and doing my own math (and injecting the results back in as an input), is there any way to express these kind of queries in native Splunk?
Nick's search is certainly on the right path, but it doesn't split by version. Let's take it further to get percentages, assuming we're starting with just basic counts like:
sourcetype=access_combined | timechart count by version
sourcetype=some_crash_log | timechart count by version
Then we'll use the same technique of taking the OR of the two sourcetypes, but this time liberally use "eval" in timechart, both to calculate the number of events per sourcetype and the ratio of the two sourcetypes:
sourcetype=access_combined OR sourcetype=some_crash_log
| timechart eval(count(eval(sourcetype=="some_crash_log"))/count(eval(sourcetype=="access_combined")))
Now, we could have broken this down for clarity, like:
sourcetype=access_combined OR sourcetype=some_crash_log
| eval is_crash = if(sourcetype=="some_crash_log",1,0)
| eval is_access = if(sourcetype=="access_combined",1,0)
| timechart eval(sum(is_crash)/sum(is_access)) by v
Thanks for the answers! I'll add my own final solution, which is much along the lines of Stephen's in that I perform one primary search and then slice and dice the results. The really tricky part is that I want this answer more for more than one build at a time, so the only way I could figure it was to use the autoregress command to peek back at events of the "other" type and then calculate the answer only where the requisite data was available. It's pretty awesome that the Splunk query language can express this, as it's really an iterative process:
host=myhost
| extract reload=T
| search (script="vercheck.cgi" OR script="crashreport.cgi")
| stats count as startupCount by bld, script
| autoregress startupCount as crashCount
| autoregress script as prevScript
| autoregress bld as prevBld
| eval BSI=if(script="vercheck.cgi" AND prevScript="crashreport.cgi",100*(startupCount-crashCount)/startupCount,0)
| search BSI>0 AND startupCount>100
| fields + bld, BSI, startupCount, crashCount
which produces the nice, clean output:
bld BSI startupCount crashCount
1 350447 70.967742 434 126
2 350352 75.700935 107 26
In the end, I realized that the output data is sensitive to the timescale of the search, but it's not really bucketed the way the a time chart is.
Thanks again for the community support!
Nick's search is certainly on the right path, but it doesn't split by version. Let's take it further to get percentages, assuming we're starting with just basic counts like:
sourcetype=access_combined | timechart count by version
sourcetype=some_crash_log | timechart count by version
Then we'll use the same technique of taking the OR of the two sourcetypes, but this time liberally use "eval" in timechart, both to calculate the number of events per sourcetype and the ratio of the two sourcetypes:
sourcetype=access_combined OR sourcetype=some_crash_log
| timechart eval(count(eval(sourcetype=="some_crash_log"))/count(eval(sourcetype=="access_combined")))
Now, we could have broken this down for clarity, like:
sourcetype=access_combined OR sourcetype=some_crash_log
| eval is_crash = if(sourcetype=="some_crash_log",1,0)
| eval is_access = if(sourcetype=="access_combined",1,0)
| timechart eval(sum(is_crash)/sum(is_access)) by v
So you've got two searches. I'm taking some liberties here but lets assume they are:
sourcetype=access_combined | timechart count by version
sourcetype=some_crash_log | timechart count by version
And you want at the end, a chart showing the ratio (or percentage) of crash incidence/usage by time. I think this should work:
sourcetype=access_combined OR sourcetype=some_crash_log | eval temp=if(searchmatch("sourcetype=access_combined"),"total", "crashes") | timechart count by temp
It'll show two values varying over time - the first is the total access_combined events during that period, the other is the number of crash events during that period. If you want ratios or percentages from there you just tack on another eval after the timechart and doing the math should be straightforward (you'll have a field called 'total', and a field called 'crashes' per row).
You're on a well-trodden path. People almost always find themselves at a crossroads where intuitively they feel they have to somehow combine two search results. Although this is possible it's not really the best approach, but nonetheless they often go off reading about the join
command and other complicated things. But the splunk approach here is to go back and rethink the search so that one search can do both things. And you very often can, with a little eval
and a little stats
.