Splunk Search

Performant way to use "dedup" with group by

duesser
Path Finder

I basically have the opposite question as can be seen here:

https://community.splunk.com/t5/Splunk-Search/How-to-use-the-head-command-with-group-by/m-p/444439

I am looking for an increase in performance while keeping the search generic. As a minimal example I created this:

 

 

| makeresults 
| eval data=split("1;1,1;2,2;1,2;2",",")
| mvexpand data
| eval data=split(data,";")
| eval a=mvindex(data,0), b=mvindex(data,1)
| table a b
| dedup a

 

 

I know that I can tremendously speed up the search if I use a template like so, using "| head 1" on each group of a:

 

 

| makeresults 
| append 
    [| makeresults 
    | eval data=split("1;1,1;2,2;1,2;2",",") 
    | mvexpand data 
    | eval data=split(data,";") 
    | eval a=mvindex(data,0), b=mvindex(data,1) 
    | table a b 
    | search a=1 
    | head 1
        ] 
| append 
    [| makeresults 
    | eval data=split("1;1,1;2,2;1,2;2",",") 
    | mvexpand data 
    | eval data=split(data,";") 
    | eval a=mvindex(data,0), b=mvindex(data,1) 
    | table a b 
    | search a=2 
    | head 1
        ] 
| search a=* 
| table a b

 

 

However, this way the search is no longer generic and I have to know what groups "a" can take (1,2 in this example)

Question: Is there a way to increase performance on dedup while also keeping the search generic?

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Do you mean something like this?

| stats first(*) as * by a

duesser
Path Finder

Yes - this works the same! BUT it yields the exact performance as "| dedup" for my real data example while the  "| head 1" approach is roughly 15x faster.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

I am not too surprised by that, head can discard events quicker than stats. You could try removing the table command from the appended searches and just have it at the end to see if that speeds things up.

0 Karma

duesser
Path Finder

It is like this my main search. I figured it would be - however, I thought there might be a trick to dynamically leverage the distinct values of "a" and then vectorize the head command or so. Thank you anyhow!

0 Karma
Get Updates on the Splunk Community!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...