Splunk Search

help on appendcols subsearch

jip31
Motivator

hi

I use the search below in order to calculate a percentage but I have a wrong result
I am explaining
When I execute the first part of the search (before appendcols), I have 66 events for the SITE "BREG"
When I execute the second part of the search (after appendcols), I have 77 events for the SITE "BREG"
So normaly, the percentage must be 85,7%
But I obtain 942% in results because the first part of the search returns well 666 events, but the second part of the search (NbIndHost) returns 7 events! (66/7)*100=942
And when I am doing debug I note que 7 correspond to another SITE than "BREG"!
So what is wrong in my subsearch please??

[| inputlookup host.csv 
    | table host] `CPU` 
| where process_cpu_used_percent>80 
| lookup test.csv HOSTNAME as host output SITE 
| stats  dc(host) as NbHostProcessSup80 by SITE
| appendcols 
    [| inputlookup host.csv 
    | lookup test.csv HOSTNAME as host output SITE 
    | stats  dc(host) as NbIndHost by SITE ] 
| eval Perc=round((NbHostProcessSup80/NbIndHost)*100,2) 
| search SITE=*Breg*
Tags (1)
0 Karma
1 Solution

HiroshiSatoh
Champion

I think JOIN is good because there are few results.

 | appendcols …
↓
 | join SITE
     [| inputlookup host.csv 
     | lookup test.csv HOSTNAME as host output SITE 
     | stats  dc(host) as NbIndHost by SITE ] 

View solution in original post

0 Karma

FrankVl
Ultra Champion

The appendcols command does not in any way guarantee that the rows correlate correctly. So unless you take care of that in the two parts of your search, you will indeed get incorrect results if the SITES are in a different order, or (what is most likely happening in this case) your first, filtered, search returns a different number of rows than your subsearch.

You'd better use a join for this, or get rid of the subsearch altogether. For example:

[| inputlookup host.csv 
     | table host] `CPU` 
 | lookup test.csv HOSTNAME as host output SITE 
 | eventstats  dc(host) as NbIndHost by SITE
 | where process_cpu_used_percent>80
 | stats  dc(host) as NbHostProcessSup80 values(NbIndHost) as NbIndHost by SITE
 | eval Perc=round((NbHostProcessSup80/NbIndHost)*100,2) 
 | search SITE=*Breg*

This uses eventstats to first count the number of distinct hosts per site, then applies the filter for cpu usage, then performs the stats as before (just adding the values() part to carry over the total number of hosts). And then calculates the percentage as before.

No 100% guarantee that my example works, as I don't know your data or what is in that CPU macro. But it should be possible to use a combination of eventstats and stats to accomplish what you are after. If that is to complex, just use a join on the SITE field instead of appendcols.

0 Karma

HiroshiSatoh
Champion

I think JOIN is good because there are few results.

 | appendcols …
↓
 | join SITE
     [| inputlookup host.csv 
     | lookup test.csv HOSTNAME as host output SITE 
     | stats  dc(host) as NbIndHost by SITE ] 
0 Karma

jip31
Motivator

Many thanks

0 Karma
Get Updates on the Splunk Community!

Video | Welcome Back to Smartness, Pedro

Remember Splunk Community member, Pedro Borges? If you tuned into Episode 2 of our Smartness interview series, ...

Detector Best Practices: Static Thresholds

Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within ...

Expert Tips from Splunk Education, Observability in Action, Plus More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...