Splunk Search

How to "chain" filters and count and optimize query

pedropiin
Path Finder

Hi everyone

I just started working with Splunk and I have a query in which one of the steps is to count the number of instances where a certain field has value > 10. But I have to count the number of instances with value > 10, > 15, > 30, > 60, > 120 and > 180. The way I'm doing it now is just by executing different counts, just as the following:

<search>...
| eval var1=...
| stats 
  count(eval(var1 > 10)) as count10,
  count(eval(var1 > 15)) as count15,
  count(eval(var1 > 30)) as count30,
  count(eval(var1 > 60)) as count60,
  count(eval(var1 > 120)) as count120,
  count(eval(var1 > 180)) as count180
...

But I'm aware this is definitely not the optimal way as, to my understanding, this will go through all the instances and count the ones > 10, then will go through all the instances again counting the ones > 15 and so on. 

How would I execute this count making use of the fact that, e.g., to count the number of instances > 120, I can check only considering the set of instances > 60 and so on? That is, how do I chain these counts and use them as "filters"? 

It's important to note that I don't want to use "where var1 > 10" multiple times as I also need to compute other metrics related to the whole dataset (e.g., avg(var1)) and, to my understanding, using just one 

| stats count(eval(var > 10)) as count10

will "drop" all of the other columns of my query.

Anyways, how would I do this?

Thank you in advance.

Labels (3)
0 Karma
1 Solution

livehybrid
SplunkTrust
SplunkTrust

Hi @pedropiin 

You could try something like the following, I have used some makeresults to visualise this as I dont have your data.

| makeresults count=100
| streamstats count as var1
| eval N=CASE(var1>180,180,var1>120,120,var1>60,60,var1>30,30,var1>15,15,var1>10,10)
| eval count{N}=1
| fields - count
| stats sum(count*) AS count*
| fillnull value=0 count10 count15 count30 count60 count120 count180
| eval count10=count10+count15+count30+count60+count120+count180
| eval count15=count15+count30+count60+count120+count180
| eval count30=count30+count60+count120+count180
| eval count60=count60+count120+count180
| eval count120=count120+count180

This assumes you want count10 to include anything where var1 is over 10, even if its also over 30.

Please let me know how you get on and consider accepting this answer or adding karma this answer if it has helped.
Regards

Will

View solution in original post

pedropiin
Path Finder

First of all, thanks @livehybrid for your suggestion. It worked perfectly.

Now, regarding what @richgalloway and @ITWhisperer proposed, you are both right as well. I'm not sure if I was able to understand everything on the job inspector, but I ran multiple test queries, and using my previous approach and the "new" approach doesn't make any difference. Both take the same amount of time while not having a big difference in the number of invocations of each "function"/"method". So thanks for the heads up! 



richgalloway
SplunkTrust
SplunkTrust

If your problem is resolved, then please click the "Accept as Solution" button to help future readers.

---
If this reply helps you, Karma would be appreciated.
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @pedropiin 

You could try something like the following, I have used some makeresults to visualise this as I dont have your data.

| makeresults count=100
| streamstats count as var1
| eval N=CASE(var1>180,180,var1>120,120,var1>60,60,var1>30,30,var1>15,15,var1>10,10)
| eval count{N}=1
| fields - count
| stats sum(count*) AS count*
| fillnull value=0 count10 count15 count30 count60 count120 count180
| eval count10=count10+count15+count30+count60+count120+count180
| eval count15=count15+count30+count60+count120+count180
| eval count30=count30+count60+count120+count180
| eval count60=count60+count120+count180
| eval count120=count120+count180

This assumes you want count10 to include anything where var1 is over 10, even if its also over 30.

Please let me know how you get on and consider accepting this answer or adding karma this answer if it has helped.
Regards

Will

richgalloway
SplunkTrust
SplunkTrust

@pedropiin wrote:

But I'm aware this is definitely not the optimal way as, to my understanding, this will go through all the instances and count the ones > 10, then will go through all the instances again counting the ones > 15 and so on. 

I'm not convinced this is correct.  Have you looked at the job inspector stats for this search?  I think you'll find it's not that inefficient.  Any attempt to "chain" filters is likely to perform much worse.

---
If this reply helps you, Karma would be appreciated.

ITWhisperer
SplunkTrust
SplunkTrust

I agree with @richgalloway , I too am not convinced that your interpretation is correct; the way I look at it is that the way SPL is processed is that the events form an event pipeline and each command in your SPL takes each event from the input event pipeline and processes it getting what it needs from that event. It doesn't go back and process the event multiple times. The stats command only outputs statistics events to its output pipeline when it has completed processing all the events on its input pipeline. Most, but not all, commands work in this way (streamstats could be seen as a notable exception).

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...