Solved: Re: How to "chain" filters and count and optimize ...

pedropiin · ‎02-17-2025

Hi everyone

I just started working with Splunk and I have a query in which one of the steps is to count the number of instances where a certain field has value > 10. But I have to count the number of instances with value > 10, > 15, > 30, > 60, > 120 and > 180. The way I'm doing it now is just by executing different counts, just as the following:

<search>...
| eval var1=...
| stats 
  count(eval(var1 > 10)) as count10,
  count(eval(var1 > 15)) as count15,
  count(eval(var1 > 30)) as count30,
  count(eval(var1 > 60)) as count60,
  count(eval(var1 > 120)) as count120,
  count(eval(var1 > 180)) as count180
...

But I'm aware this is definitely not the optimal way as, to my understanding, this will go through all the instances and count the ones > 10, then will go through all the instances again counting the ones > 15 and so on.

How would I execute this count making use of the fact that, e.g., to count the number of instances > 120, I can check only considering the set of instances > 60 and so on? That is, how do I chain these counts and use them as "filters"?

It's important to note that I don't want to use "where var1 > 10" multiple times as I also need to compute other metrics related to the whole dataset (e.g., avg(var1)) and, to my understanding, using just one

| stats count(eval(var > 10)) as count10

will "drop" all of the other columns of my query.

Anyways, how would I do this?

Thank you in advance.

livehybrid · ‎02-17-2025

Hi @pedropiin

You could try something like the following, I have used some makeresults to visualise this as I dont have your data.

| makeresults count=100
| streamstats count as var1
| eval N=CASE(var1>180,180,var1>120,120,var1>60,60,var1>30,30,var1>15,15,var1>10,10)
| eval count{N}=1
| fields - count
| stats sum(count*) AS count*
| fillnull value=0 count10 count15 count30 count60 count120 count180
| eval count10=count10+count15+count30+count60+count120+count180
| eval count15=count15+count30+count60+count120+count180
| eval count30=count30+count60+count120+count180
| eval count60=count60+count120+count180
| eval count120=count120+count180

This assumes you want count10 to include anything where var1 is over 10, even if its also over 30.

Please let me know how you get on and consider accepting this answer or adding karma this answer if it has helped.
Regards

Will

View solution in original post

pedropiin · ‎02-17-2025

First of all, thanks @livehybrid for your suggestion. It worked perfectly.

Now, regarding what @richgalloway and @ITWhisperer proposed, you are both right as well. I'm not sure if I was able to understand everything on the job inspector, but I ran multiple test queries, and using my previous approach and the "new" approach doesn't make any difference. Both take the same amount of time while not having a big difference in the number of invocations of each "function"/"method". So thanks for the heads up!

richgalloway · ‎02-17-2025

If your problem is resolved, then please click the "Accept as Solution" button to help future readers.

---
If this reply helps you, Karma would be appreciated.

livehybrid · ‎02-17-2025

Hi @pedropiin

You could try something like the following, I have used some makeresults to visualise this as I dont have your data.

| makeresults count=100
| streamstats count as var1
| eval N=CASE(var1>180,180,var1>120,120,var1>60,60,var1>30,30,var1>15,15,var1>10,10)
| eval count{N}=1
| fields - count
| stats sum(count*) AS count*
| fillnull value=0 count10 count15 count30 count60 count120 count180
| eval count10=count10+count15+count30+count60+count120+count180
| eval count15=count15+count30+count60+count120+count180
| eval count30=count30+count60+count120+count180
| eval count60=count60+count120+count180
| eval count120=count120+count180

This assumes you want count10 to include anything where var1 is over 10, even if its also over 30.

Please let me know how you get on and consider accepting this answer or adding karma this answer if it has helped.
Regards

Will

richgalloway · ‎02-17-2025

@pedropiin wrote:
But I'm aware this is definitely not the optimal way as, to my understanding, this will go through all the instances and count the ones > 10, then will go through all the instances again counting the ones > 15 and so on.

I'm not convinced this is correct. Have you looked at the job inspector stats for this search? I think you'll find it's not that inefficient. Any attempt to "chain" filters is likely to perform much worse.

---
If this reply helps you, Karma would be appreciated.

ITWhisperer · ‎02-17-2025

I agree with @richgalloway , I too am not convinced that your interpretation is correct; the way I look at it is that the way SPL is processed is that the events form an event pipeline and each command in your SPL takes each event from the input event pipeline and processes it getting what it needs from that event. It doesn't go back and process the event multiple times. The stats command only outputs statistics events to its output pipeline when it has completed processing all the events on its input pipeline. Most, but not all, commands work in this way (streamstats could be seen as a notable exception).

How to "chain" filters and count and optimize query

count

eval

stats

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

Are you a member of the Splunk Community?