Hi Everyone!
I have researched this issue and found a few solutions, though not completely. I followed this link:
https://answers.splunk.com/answers/557838/create-an-alert-based-on-cpu-being-at-95-for-a-spa.html
and wanted to know if I can use "%_Processor_Time" instead of CPUPct as I am not able to extract "CPUPct" field.
Also, I followed this link: https://answers.splunk.com/answers/693250/how-do-i-alert-if-cpu-is-greater-than-97-for-more.html
here, I wanted to understand what does "instance=Total" mean?
Also, which one of the accepted answers is better to use? The queries I used are as follows:
SPL 1:
index="perfmoncpu" | bin _time span=1m
| stats max(%_Processor_Time) as PercentProcessorTime by host _time
| eval PercentProcessorTime = round(PercentProcessorTime, 2)
| eval overload = if(PercentProcessorTime >= 90, 1, 0)
|streamstats current=f last(overload) as prevload by host
|eval newgroup=case(isnull(prevload),1, prevload!=overload,1, true(),0)
|streamstats sum(newgroup) as groupno by host
|eventstats count as LoadDuration by host groupno
| where overload = 1 and LoadDuration >= 10
| table host _time PercentProcessorTime LoadDuration
SPL 2:
index="perfmoncpu" source="PerfmonMk:CPU" instance=_Total
| sort 0 _time
| streamstats time_window=15min avg(cpu_load_percent) as last15min_load count by host
| eval last15min_load = if (count < 90,null,round(last15min_load, 2))
| where (last15min_load) >= 90
| table host, cpu_load_percent, last15min_load
I have used count<90 as the above SPL generates a count of 90 mins throughout
Please let me know if you guys have any further questions.
Thank You!
PS: I am a newbie trying to learn splunk!
... View more