Hi everyone
I just started working with Splunk and I have a query in which one of the steps is to count the number of instances where a certain field has value > 10. But I have to count the number of instances with value > 10, > 15, > 30, > 60, > 120 and > 180. The way I'm doing it now is just by executing different counts, just as the following:
<search>...
| eval var1=...
| stats
count(eval(var1 > 10)) as count10,
count(eval(var1 > 15)) as count15,
count(eval(var1 > 30)) as count30,
count(eval(var1 > 60)) as count60,
count(eval(var1 > 120)) as count120,
count(eval(var1 > 180)) as count180
...
But I'm aware this is definitely not the optimal way as, to my understanding, this will go through all the instances and count the ones > 10, then will go through all the instances again counting the ones > 15 and so on.
How would I execute this count making use of the fact that, e.g., to count the number of instances > 120, I can check only considering the set of instances > 60 and so on? That is, how do I chain these counts and use them as "filters"?
It's important to note that I don't want to use "where var1 > 10" multiple times as I also need to compute other metrics related to the whole dataset (e.g., avg(var1)) and, to my understanding, using just one
| stats count(eval(var > 10)) as count10
will "drop" all of the other columns of my query.
Anyways, how would I do this?
Thank you in advance.
Hi @pedropiin
You could try something like the following, I have used some makeresults to visualise this as I dont have your data.
| makeresults count=100
| streamstats count as var1
| eval N=CASE(var1>180,180,var1>120,120,var1>60,60,var1>30,30,var1>15,15,var1>10,10)
| eval count{N}=1
| fields - count
| stats sum(count*) AS count*
| fillnull value=0 count10 count15 count30 count60 count120 count180
| eval count10=count10+count15+count30+count60+count120+count180
| eval count15=count15+count30+count60+count120+count180
| eval count30=count30+count60+count120+count180
| eval count60=count60+count120+count180
| eval count120=count120+count180
This assumes you want count10 to include anything where var1 is over 10, even if its also over 30.
Please let me know how you get on and consider accepting this answer or adding karma this answer if it has helped.
Regards
Will
First of all, thanks @livehybrid for your suggestion. It worked perfectly.
Now, regarding what @richgalloway and @ITWhisperer proposed, you are both right as well. I'm not sure if I was able to understand everything on the job inspector, but I ran multiple test queries, and using my previous approach and the "new" approach doesn't make any difference. Both take the same amount of time while not having a big difference in the number of invocations of each "function"/"method". So thanks for the heads up!
If your problem is resolved, then please click the "Accept as Solution" button to help future readers.
Hi @pedropiin
You could try something like the following, I have used some makeresults to visualise this as I dont have your data.
| makeresults count=100
| streamstats count as var1
| eval N=CASE(var1>180,180,var1>120,120,var1>60,60,var1>30,30,var1>15,15,var1>10,10)
| eval count{N}=1
| fields - count
| stats sum(count*) AS count*
| fillnull value=0 count10 count15 count30 count60 count120 count180
| eval count10=count10+count15+count30+count60+count120+count180
| eval count15=count15+count30+count60+count120+count180
| eval count30=count30+count60+count120+count180
| eval count60=count60+count120+count180
| eval count120=count120+count180
This assumes you want count10 to include anything where var1 is over 10, even if its also over 30.
Please let me know how you get on and consider accepting this answer or adding karma this answer if it has helped.
Regards
Will
@pedropiin wrote:But I'm aware this is definitely not the optimal way as, to my understanding, this will go through all the instances and count the ones > 10, then will go through all the instances again counting the ones > 15 and so on.
I'm not convinced this is correct. Have you looked at the job inspector stats for this search? I think you'll find it's not that inefficient. Any attempt to "chain" filters is likely to perform much worse.
I agree with @richgalloway , I too am not convinced that your interpretation is correct; the way I look at it is that the way SPL is processed is that the events form an event pipeline and each command in your SPL takes each event from the input event pipeline and processes it getting what it needs from that event. It doesn't go back and process the event multiple times. The stats command only outputs statistics events to its output pipeline when it has completed processing all the events on its input pipeline. Most, but not all, commands work in this way (streamstats could be seen as a notable exception).