Alerting

How to create and trigger an alert if the CPU usage is constantly 100% for the past 10 minutes?

New Member

Hello,

We have both Windows and Linux environments. We want to set up an alert to send an email if the CPU usage of a particular process is constantly 100% during past 10 minutes. Below is the search I have for the CPU usage:

Linux:

host=yyyy index=* COMMAND=java USER=xxxxxx | timechart span=10m limit=0 avg(pctCPU) as "% of CPU Usage"

Windows:

index=* host=zzzz sourcetype="Perfmon:CPU" source="Perfmon:CPU" counter="% Processor Time" | timechart span=10m limit=0 avg(Value) as "% of CPU Usage"
0 Karma
1 Solution

Path Finder

You can use a real-time alert with a rolling window of 10 minutes with the following search:

Linux:

host=yyyy index=* COMMAND=java USER=xxxxxx | stats avg(pctCPU) as CPUUsage | where CPUUsage = 100

Windows:

index=* host=zzzz sourcetype="Perfmon:CPU" source="Perfmon:CPU" counter="% Processor Time" | stats avg(value) as CPUUsage | where CPUUsage = 100

These searches create a result when the avg is at 100, which can only be the case if it has been at a constant 100%.
You then can use the "Per-Result" trigger of the real time alert which triggers if the search returns results.

View solution in original post

0 Karma

Path Finder

You can use a real-time alert with a rolling window of 10 minutes with the following search:

Linux:

host=yyyy index=* COMMAND=java USER=xxxxxx | stats avg(pctCPU) as CPUUsage | where CPUUsage = 100

Windows:

index=* host=zzzz sourcetype="Perfmon:CPU" source="Perfmon:CPU" counter="% Processor Time" | stats avg(value) as CPUUsage | where CPUUsage = 100

These searches create a result when the avg is at 100, which can only be the case if it has been at a constant 100%.
You then can use the "Per-Result" trigger of the real time alert which triggers if the search returns results.

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

Hi @akash5333,
Try creating a real-time alert with rolling time window triggering. This will let you monitor for conditions that occur within a particular time window (in this case, CPU usage in a 10 minute span).

See
http://docs.splunk.com/Documentation/Splunk/6.3.3/Alert/Definerolling-windowalerts

Hope this helps!

0 Karma

New Member

Hi @frobinson,

Here are my output of my query in the span of 10 minutes, I have set an rolling alert to send email if CPUusage is more than 10 but I never received the alert. Please let me know where I am going wrong.

2016-03-04 09:50:00
1.9
13.6
27.3
3.0
54.6

0 Karma

Splunk Employee
Splunk Employee

Hi @akash5333,
What are your trigger conditions? Are you throttling the alert at all?

0 Karma

New Member

Hi @frobinson,

Yes I have set the throttle for 10 seconds. Here is trigger condition.

Realtime Alert - search pctCPU>10 - in 10 seconds

0 Karma

Splunk Employee
Splunk Employee

Thanks--taking a look and I'll get back to you soon!

0 Karma

Splunk Employee
Splunk Employee

Hi @akash5333,
I'm not sure which query you are using. Is it one of the original queries you posted or the suggested queries in this post? I think there may be a couple problems with the trigger condition. It sounds like your query renames the average CPU percentage but your trigger condition is checking a field in the original event data.

Keep in mind that a custom trigger condition is a secondary search applied to your base query's results. So you might need to double-check the query result fields to make sure you are using the right fields in the trigger condition.

Also, I'm not sure that the "pctCPU>10" and "in 10 seconds" part of the condition match the alert scenario you mentioned at first. This might be something to double-check too.

Have you tried the suggested queries from @JMichaelis? They might match the scenario you want more closely.

Hope this helps!