Splunk Search

help on appendcols subsearch

jip31
Motivator

hi

I use the search below in order to calculate a percentage but I have a wrong result
I am explaining
When I execute the first part of the search (before appendcols), I have 66 events for the SITE "BREG"
When I execute the second part of the search (after appendcols), I have 77 events for the SITE "BREG"
So normaly, the percentage must be 85,7%
But I obtain 942% in results because the first part of the search returns well 666 events, but the second part of the search (NbIndHost) returns 7 events! (66/7)*100=942
And when I am doing debug I note que 7 correspond to another SITE than "BREG"!
So what is wrong in my subsearch please??

[| inputlookup host.csv 
    | table host] `CPU` 
| where process_cpu_used_percent>80 
| lookup test.csv HOSTNAME as host output SITE 
| stats  dc(host) as NbHostProcessSup80 by SITE
| appendcols 
    [| inputlookup host.csv 
    | lookup test.csv HOSTNAME as host output SITE 
    | stats  dc(host) as NbIndHost by SITE ] 
| eval Perc=round((NbHostProcessSup80/NbIndHost)*100,2) 
| search SITE=*Breg*
Tags (1)
0 Karma
1 Solution

HiroshiSatoh
Champion

I think JOIN is good because there are few results.

 | appendcols …
↓
 | join SITE
     [| inputlookup host.csv 
     | lookup test.csv HOSTNAME as host output SITE 
     | stats  dc(host) as NbIndHost by SITE ] 

View solution in original post

0 Karma

FrankVl
Ultra Champion

The appendcols command does not in any way guarantee that the rows correlate correctly. So unless you take care of that in the two parts of your search, you will indeed get incorrect results if the SITES are in a different order, or (what is most likely happening in this case) your first, filtered, search returns a different number of rows than your subsearch.

You'd better use a join for this, or get rid of the subsearch altogether. For example:

[| inputlookup host.csv 
     | table host] `CPU` 
 | lookup test.csv HOSTNAME as host output SITE 
 | eventstats  dc(host) as NbIndHost by SITE
 | where process_cpu_used_percent>80
 | stats  dc(host) as NbHostProcessSup80 values(NbIndHost) as NbIndHost by SITE
 | eval Perc=round((NbHostProcessSup80/NbIndHost)*100,2) 
 | search SITE=*Breg*

This uses eventstats to first count the number of distinct hosts per site, then applies the filter for cpu usage, then performs the stats as before (just adding the values() part to carry over the total number of hosts). And then calculates the percentage as before.

No 100% guarantee that my example works, as I don't know your data or what is in that CPU macro. But it should be possible to use a combination of eventstats and stats to accomplish what you are after. If that is to complex, just use a join on the SITE field instead of appendcols.

0 Karma

HiroshiSatoh
Champion

I think JOIN is good because there are few results.

 | appendcols …
↓
 | join SITE
     [| inputlookup host.csv 
     | lookup test.csv HOSTNAME as host output SITE 
     | stats  dc(host) as NbIndHost by SITE ] 
0 Karma

jip31
Motivator

Many thanks

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...