I can create a query and produce a time chart so I can see the load across the set of cpu
|timechart values(VALUE) span=15m by cpu limit=0
I can see a trend that one cpu has a higher loader
I can also create a query using the stats to get the avg/Max/Range of the load value
stats max(VALUE) as MaxV, mean(VALUE) as MeanV, range(VALUE) as Delta by _time
What I want to do is identify any CPU that's running a higher load than avg plus some sort of fiddle factor?
It is not clear what your actual requirement is - Which avg are you want to compare to? The average VALUE for that time period (15m) across all cpus, or the average for that cpu across the whole time period?
Assuming the former, a "standard" way of looking for a "fiddle factor" is to determine the standard deviation (for the VALUEs in the time period - 15m), and determine for each cpu how many stdevs the VALUE is above the mean. You might do this like this
| eventstats mean(VALUE) as MeanV stdev(VALUE) as StDevV by _time
| eval exceedFactor=if(VALUE > MeanV,(VALUE - MeanV)/StDevV, 0)
| timechart values(exceedFactor) span=15m by cpu limit=0
It is not clear what your actual requirement is - Which avg are you want to compare to? The average VALUE for that time period (15m) across all cpus, or the average for that cpu across the whole time period?
Assuming the former, a "standard" way of looking for a "fiddle factor" is to determine the standard deviation (for the VALUEs in the time period - 15m), and determine for each cpu how many stdevs the VALUE is above the mean. You might do this like this
| eventstats mean(VALUE) as MeanV stdev(VALUE) as StDevV by _time
| eval exceedFactor=if(VALUE > MeanV,(VALUE - MeanV)/StDevV, 0)
| timechart values(exceedFactor) span=15m by cpu limit=0
no, that not right
putting the cpu into by clause for the stats command doesn't give the mean value for cluster
Its performing the stats on the individual cpu's
Hi @jhuysing ,
I don't know which data are you monitoring, but anyway, youcan add the CPU name to the stats BY clause.
Then you can create your own rule to fire an alert: e.g. max value more than 30% of the average, etc... using a where condition.
in your case (using 30% more than MeanV):
<your_search>
| bin span=15m _time
| stats
max(VALUE) AS MaxV
mean(VALUE) AS MeanV
range(VALUE) AS Delta
BY _time CPU
| where MaxV>MeanV*1.3If you use _time in the stats command, remember to add the bin command before.
Ciao.
Giuseppe