Solved: Re: help on appendcols subsearch

jip31 · ‎10-29-2019

hi

I use the search below in order to calculate a percentage but I have a wrong result
I am explaining
When I execute the first part of the search (before appendcols), I have 66 events for the SITE "BREG"
When I execute the second part of the search (after appendcols), I have 77 events for the SITE "BREG"
So normaly, the percentage must be 85,7%
But I obtain 942% in results because the first part of the search returns well 666 events, but the second part of the search (NbIndHost) returns 7 events! (66/7)*100=942
And when I am doing debug I note que 7 correspond to another SITE than "BREG"!
So what is wrong in my subsearch please??

[| inputlookup host.csv 
    | table host] `CPU` 
| where process_cpu_used_percent>80 
| lookup test.csv HOSTNAME as host output SITE 
| stats  dc(host) as NbHostProcessSup80 by SITE
| appendcols 
    [| inputlookup host.csv 
    | lookup test.csv HOSTNAME as host output SITE 
    | stats  dc(host) as NbIndHost by SITE ] 
| eval Perc=round((NbHostProcessSup80/NbIndHost)*100,2) 
| search SITE=*Breg*

HiroshiSatoh · ‎10-29-2019

I think JOIN is good because there are few results.

 | appendcols …
↓
 | join SITE
     [| inputlookup host.csv 
     | lookup test.csv HOSTNAME as host output SITE 
     | stats  dc(host) as NbIndHost by SITE ]

View solution in original post

FrankVl · ‎10-29-2019

The appendcols command does not in any way guarantee that the rows correlate correctly. So unless you take care of that in the two parts of your search, you will indeed get incorrect results if the SITES are in a different order, or (what is most likely happening in this case) your first, filtered, search returns a different number of rows than your subsearch.

You'd better use a join for this, or get rid of the subsearch altogether. For example:

[| inputlookup host.csv 
     | table host] `CPU` 
 | lookup test.csv HOSTNAME as host output SITE 
 | eventstats  dc(host) as NbIndHost by SITE
 | where process_cpu_used_percent>80
 | stats  dc(host) as NbHostProcessSup80 values(NbIndHost) as NbIndHost by SITE
 | eval Perc=round((NbHostProcessSup80/NbIndHost)*100,2) 
 | search SITE=*Breg*

This uses eventstats to first count the number of distinct hosts per site, then applies the filter for cpu usage, then performs the stats as before (just adding the values() part to carry over the total number of hosts). And then calculates the percentage as before.

No 100% guarantee that my example works, as I don't know your data or what is in that CPU macro. But it should be possible to use a combination of eventstats and stats to accomplish what you are after. If that is to complex, just use a join on the SITE field instead of appendcols.

HiroshiSatoh · ‎10-29-2019

I think JOIN is good because there are few results.

 | appendcols …
↓
 | join SITE
     [| inputlookup host.csv 
     | lookup test.csv HOSTNAME as host output SITE 
     | stats  dc(host) as NbIndHost by SITE ]

jip31 · ‎10-29-2019

Many thanks

help on appendcols subsearch

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes