Hello,
We have both Windows and Linux environments. We want to set up an alert to send an email if the CPU usage of a particular process is constantly 100% during past 10 minutes. Below is the search I have for the CPU usage:
Linux:
host=yyyy index=* COMMAND=java USER=xxxxxx | timechart span=10m limit=0 avg(pctCPU) as "% of CPU Usage"
Windows:
index=* host=zzzz sourcetype="Perfmon:CPU" source="Perfmon:CPU" counter="% Processor Time" | timechart span=10m limit=0 avg(Value) as "% of CPU Usage"
You can use a real-time alert with a rolling window of 10 minutes with the following search:
Linux:
host=yyyy index=* COMMAND=java USER=xxxxxx | stats avg(pctCPU) as CPUUsage | where CPUUsage = 100
Windows:
index=* host=zzzz sourcetype="Perfmon:CPU" source="Perfmon:CPU" counter="% Processor Time" | stats avg(value) as CPUUsage | where CPUUsage = 100
These searches create a result when the avg is at 100, which can only be the case if it has been at a constant 100%.
You then can use the "Per-Result" trigger of the real time alert which triggers if the search returns results.
You can use a real-time alert with a rolling window of 10 minutes with the following search:
Linux:
host=yyyy index=* COMMAND=java USER=xxxxxx | stats avg(pctCPU) as CPUUsage | where CPUUsage = 100
Windows:
index=* host=zzzz sourcetype="Perfmon:CPU" source="Perfmon:CPU" counter="% Processor Time" | stats avg(value) as CPUUsage | where CPUUsage = 100
These searches create a result when the avg is at 100, which can only be the case if it has been at a constant 100%.
You then can use the "Per-Result" trigger of the real time alert which triggers if the search returns results.
Hi @akash5333,
Try creating a real-time alert with rolling time window triggering. This will let you monitor for conditions that occur within a particular time window (in this case, CPU usage in a 10 minute span).
See
http://docs.splunk.com/Documentation/Splunk/6.3.3/Alert/Definerolling-windowalerts
Hope this helps!
Hi @frobinson,
Here are my output of my query in the span of 10 minutes, I have set an rolling alert to send email if CPUusage is more than 10 but I never received the alert. Please let me know where I am going wrong.
2016-03-04 09:50:00
1.9
13.6
27.3
3.0
54.6
Hi @akash5333,
What are your trigger conditions? Are you throttling the alert at all?
Hi @frobinson,
Yes I have set the throttle for 10 seconds. Here is trigger condition.
Realtime Alert - search pctCPU>10 - in 10 seconds
Thanks--taking a look and I'll get back to you soon!
Hi @akash5333,
I'm not sure which query you are using. Is it one of the original queries you posted or the suggested queries in this post? I think there may be a couple problems with the trigger condition. It sounds like your query renames the average CPU percentage but your trigger condition is checking a field in the original event data.
Keep in mind that a custom trigger condition is a secondary search applied to your base query's results. So you might need to double-check the query result fields to make sure you are using the right fields in the trigger condition.
Also, I'm not sure that the "pctCPU>10" and "in 10 seconds" part of the condition match the alert scenario you mentioned at first. This might be something to double-check too.
Have you tried the suggested queries from @JMichaelis? They might match the scenario you want more closely.
Hope this helps!