Splunk Search

detemine cpu which has higher load than average

jhuysing
Explorer

I can create a query and produce a time chart so I can see the load across the set of cpu

 

|timechart values(VALUE) span=15m by cpu  limit=0

 

load.png   I can see a trend that one cpu has a higher loader

I can also create a query using the stats to get the avg/Max/Range  of the load value

 

stats max(VALUE) as MaxV,  mean(VALUE) as MeanV,  range(VALUE) as Delta by _time

 

stats.png 
What I want to do is identify any CPU  that's running a higher load than avg plus some sort of fiddle factor?


Labels (2)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

It is not clear what your actual requirement is - Which avg are you want to compare to? The average VALUE for that time period (15m) across all cpus, or the average for that cpu across the whole time period?

Assuming the former, a "standard" way of looking for a "fiddle factor" is to determine the standard deviation (for the VALUEs in the time period - 15m), and determine for each cpu how many stdevs the VALUE is above the mean. You might do this like this

| eventstats mean(VALUE) as MeanV stdev(VALUE) as StDevV by _time
| eval exceedFactor=if(VALUE > MeanV,(VALUE - MeanV)/StDevV, 0)
| timechart values(exceedFactor) span=15m by cpu limit=0

View solution in original post

ITWhisperer
SplunkTrust
SplunkTrust

It is not clear what your actual requirement is - Which avg are you want to compare to? The average VALUE for that time period (15m) across all cpus, or the average for that cpu across the whole time period?

Assuming the former, a "standard" way of looking for a "fiddle factor" is to determine the standard deviation (for the VALUEs in the time period - 15m), and determine for each cpu how many stdevs the VALUE is above the mean. You might do this like this

| eventstats mean(VALUE) as MeanV stdev(VALUE) as StDevV by _time
| eval exceedFactor=if(VALUE > MeanV,(VALUE - MeanV)/StDevV, 0)
| timechart values(exceedFactor) span=15m by cpu limit=0

jhuysing
Explorer

no, that not right
putting  the cpu into by clause for the stats command doesn't   give the mean value for cluster
Its performing the stats  on the individual cpu's

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @jhuysing ,

I don't know which data are you monitoring, but anyway, youcan add the CPU name to the stats BY clause.

Then you can create your own rule to fire an alert: e.g. max value more than 30% of the average, etc... using a where condition.

in your case (using 30% more than MeanV):

<your_search>
| bin span=15m _time
| stats 
     max(VALUE) AS MaxV
     mean(VALUE) AS MeanV
     range(VALUE) AS Delta 
     BY _time CPU
| where MaxV>MeanV*1.3

If you use _time in the stats command, remember to add the bin command before.

Ciao.

Giuseppe

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...