Splunk Search

How can I perform math on the counts from two queries

markgo
Engager

Here's the situation:

I have one set of web log events that represent people using my app which I generally display in a timescaled graph split by the version they are using.

I have another set of web log events that represent crash reports, which I display the same way.

I'd like to calculate a "crash incidence / usage", which would require taking the count of crashes for a particular time bucket and dividing it by the count of usage events over the same time period.

Short of running extractions using the CLI and doing my own math (and injecting the results back in as an input), is there any way to express these kind of queries in native Splunk?

1 Solution

Stephen_Sorkin
Splunk Employee
Splunk Employee

Nick's search is certainly on the right path, but it doesn't split by version. Let's take it further to get percentages, assuming we're starting with just basic counts like:

sourcetype=access_combined | timechart count by version
sourcetype=some_crash_log | timechart count by version

Then we'll use the same technique of taking the OR of the two sourcetypes, but this time liberally use "eval" in timechart, both to calculate the number of events per sourcetype and the ratio of the two sourcetypes:

sourcetype=access_combined OR sourcetype=some_crash_log
| timechart eval(count(eval(sourcetype=="some_crash_log"))/count(eval(sourcetype=="access_combined")))

Now, we could have broken this down for clarity, like:

sourcetype=access_combined OR sourcetype=some_crash_log
| eval is_crash = if(sourcetype=="some_crash_log",1,0)
| eval is_access = if(sourcetype=="access_combined",1,0)
| timechart eval(sum(is_crash)/sum(is_access)) by v

View solution in original post

markgo
Engager

Thanks for the answers! I'll add my own final solution, which is much along the lines of Stephen's in that I perform one primary search and then slice and dice the results. The really tricky part is that I want this answer more for more than one build at a time, so the only way I could figure it was to use the autoregress command to peek back at events of the "other" type and then calculate the answer only where the requisite data was available. It's pretty awesome that the Splunk query language can express this, as it's really an iterative process:

host=myhost
| extract reload=T
| search (script="vercheck.cgi" OR script="crashreport.cgi")
| stats count as startupCount by bld, script
| autoregress startupCount as crashCount
| autoregress script as prevScript
| autoregress bld as prevBld
| eval BSI=if(script="vercheck.cgi" AND prevScript="crashreport.cgi",100*(startupCount-crashCount)/startupCount,0)
| search BSI>0 AND startupCount>100
| fields + bld, BSI, startupCount, crashCount

which produces the nice, clean output:

      bld     BSI       startupCount crashCount 
1   350447  70.967742   434         126
2   350352  75.700935   107         26

In the end, I realized that the output data is sensitive to the timescale of the search, but it's not really bucketed the way the a time chart is.

Thanks again for the community support!

Stephen_Sorkin
Splunk Employee
Splunk Employee

Nick's search is certainly on the right path, but it doesn't split by version. Let's take it further to get percentages, assuming we're starting with just basic counts like:

sourcetype=access_combined | timechart count by version
sourcetype=some_crash_log | timechart count by version

Then we'll use the same technique of taking the OR of the two sourcetypes, but this time liberally use "eval" in timechart, both to calculate the number of events per sourcetype and the ratio of the two sourcetypes:

sourcetype=access_combined OR sourcetype=some_crash_log
| timechart eval(count(eval(sourcetype=="some_crash_log"))/count(eval(sourcetype=="access_combined")))

Now, we could have broken this down for clarity, like:

sourcetype=access_combined OR sourcetype=some_crash_log
| eval is_crash = if(sourcetype=="some_crash_log",1,0)
| eval is_access = if(sourcetype=="access_combined",1,0)
| timechart eval(sum(is_crash)/sum(is_access)) by v

sideview
SplunkTrust
SplunkTrust

So you've got two searches. I'm taking some liberties here but lets assume they are:

sourcetype=access_combined | timechart count by version

sourcetype=some_crash_log | timechart count by version

And you want at the end, a chart showing the ratio (or percentage) of crash incidence/usage by time. I think this should work:

sourcetype=access_combined OR sourcetype=some_crash_log | eval temp=if(searchmatch("sourcetype=access_combined"),"total", "crashes") | timechart count by temp

It'll show two values varying over time - the first is the total access_combined events during that period, the other is the number of crash events during that period. If you want ratios or percentages from there you just tack on another eval after the timechart and doing the math should be straightforward (you'll have a field called 'total', and a field called 'crashes' per row).

You're on a well-trodden path. People almost always find themselves at a crossroads where intuitively they feel they have to somehow combine two search results. Although this is possible it's not really the best approach, but nonetheless they often go off reading about the join command and other complicated things. But the splunk approach here is to go back and rethink the search so that one search can do both things. And you very often can, with a little eval and a little stats.

Get Updates on the Splunk Community!

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...