Hi,
Alert Query to monitor CPU usage every 5 minutes and send an email if it matches 5 of 6 bad samples (i.e., if my CPU utilization is greater than 95% for 5 out of 6 intervals (each interval with a 5-minute gap), we need to trigger emails with High importance.
Here, we are failing to query to check the 5 of 6 bad samples. Please assist me in getting out of this situation.
That's a completely different story. And more complicated.
The 'head' command only shows first N results, it doesn't distinguish by any field so you have to count the actual results and limit them in other way.
| streamstats count by host
| where count<=6
That's how you get your events to be statsed (you don't do the "head" command now!)
Then you can do your stats by each host
| stats count(eval(CpuUsage > 85)) as count by host
And you can add values of your CpuUsages to that command. So instead of the last line you can do
| stats count(eval(CpuUsage > 85)) as count values(CpuUsage) as CpuUsage by host
The question is - do you have data in your Splunk instance to find that? And is this data of sufficient quality?
Show us a sample.
Hi @PickleRick
Yes, we have data for months, and it is sufficient and accurate, as CPU data is loading to our instance from a number of systems every 5 minutes.
OK. Without digging too deply about being more precise about requirements. Assuming that you want to alert if at least 5 out of 6 subsequent measurements are "bad", you can do this in two different ways.
1) Assuming that you want only latest state. You need to search over at least half an hour into the past (6*5m=30m)
<your initial search> earliest=-35m
| head 6
| stats count(eval(your_condition_for_cpu_utilization_here)) as count
| eval result=if(count>=5,"High utilization","Normal")
2) Assuming that you want to track it over a longer period
<your initial search> earliest=some_longer_time_ago
| streamstats window=6 count(eval(your_condition_for_cpu_utilization_here)) as count
| eval result=if(count>=5,"High utilization","Normal")
You might add more logic to split it by host or something like that.
thanks @PickleRick for helping out here.
According to the suggestion you provide below I constructed the following query shown below.
However, I need to fulfill below requirements, please let me know.
Q1: - How can I verify this by the host.
Q2: - head 6 (each time it should check for a 5-minute range event)?
Q3:- Can we able to display the results of those last 5 intervals cpu percentage values in a new column ? ex:- 86, 92, 89,45,99,90
index=* sourcetype=cpu host=* earliest=-35m | rename "%_Idle_Time" as Percent_Idle_Time | eval CpuUsage=coalesce(100-Percent_Idle_Time,100-PercentIdleTime) | head 6 | stats count(eval(CpuUsage > 85)) as count
| eval result=if(count>=5,"High utilization","Normal")
That's a completely different story. And more complicated.
The 'head' command only shows first N results, it doesn't distinguish by any field so you have to count the actual results and limit them in other way.
| streamstats count by host
| where count<=6
That's how you get your events to be statsed (you don't do the "head" command now!)
Then you can do your stats by each host
| stats count(eval(CpuUsage > 85)) as count by host
And you can add values of your CpuUsages to that command. So instead of the last line you can do
| stats count(eval(CpuUsage > 85)) as count values(CpuUsage) as CpuUsage by host
Hi @PickleRick
Do you know a trick for averaging the CPU values from the recent six events? I'm trying to produce the query below, but avg(values(CpuUsage)) isn't working.
index=* sourcetype=cpu CPU=all host=* earliest=-35m | rename "%_Idle_Time" as Percent_Idle_Time | eval CpuUsage=coalesce(100-Percent_Idle_Time,100-PercentIdleTime) | streamstats count by host | where count<=6 | stats avg(values(CpuUsage)) as "Average of CpuUsage last 6 intervals(5mins range)" by host
Regards,
Satheesh
You can't do "avg(values(X))". values() will produce a multivalued field. Why not just avg(CpuUsage)?
Thank you so much @PickleRick for your help. it's worked for my requirement now.
Appreciated your time and support.