Alerting

How to create an alert to trigger when a host exceeds a certain value over X number minutes?

tmontney
Builder

Example: Any host in the index exceeds 50% CPU usage for 5 minutes or more. So essentially, I need an alert when 5 events come in within a 5 minute span, all with a value above 50 from the same host.

To be clear: This does not mean that CPU usage can exceed 50% at 1 PM and exceed 50% at 1:05 PM, but not exceed 50% at 1:01-1:04 PM. In my case, I poll CPU usage once every minute. So, I'll have 5 events in 5 minutes. (Whether or not it dropped within a minute but came back is not of concern.)

0 Karma

somesoni2
Revered Legend

Try like this

search:

your base search here host=* cpuusagefield>=50 |stats count by host | where count>=5

Time range:

Start time: -6m@m   End time: -1m@m

Cron Schedule: (runs every 5min starting from 1st min, 1,6,11,16,21,26...)

1-59/5 * * * *
0 Karma

tmontney
Builder

Strangely with enough tinkering, we both came up with similar search queries. I ran it and search returned results with hosts that had less than 5 events. For instance, in the last 5 minutes I get a PC with 2 results.

5:16:54 and 5:19:03, both falling within the target usage value. Then I decided to do a query in general for that PC. I have Splunk to query for cpu usage every 60 seconds for about 40 PCs. It also queries every 60 seconds for amount of RAM free, and disk idle time (for those same 40 PCs). To my surprise, I see 12 events for a 15 minute window (a query looking for this PC, in the perfmon index for any entries regarding cpu usage). I should see 15 events (or possibly 14). For instance, 5:16:54 and 5:14:21 are the last two events. Why is there over a 2 minute gap? splunkd log doesn't say anything (no errors regarding this PC).

0 Karma

frobinson_splun
Splunk Employee
Splunk Employee

Hi @tmontney,
This sounds like a real-time alert with rolling window triggering. You can review documentation for setting up this alert type here:
http://docs.splunk.com/Documentation/Splunk/6.5.0/Alert/DefineRealTimeAlerts#Create_a_real-time_aler...

There's an example here:
http://docs.splunk.com/Documentation/Splunk/6.5.0/Alert/Alertexamples#Real-time_alert_example

And here's a general comparison of alert types:
http://docs.splunk.com/Documentation/Splunk/6.5.0/Alert/AlertTypesOverview

Hope this helps!

MuS
Legend

Hi tmontney,

If I understand your request correct you can try something like this:

  your base search here host=* cpuusagefield>=50 | timechart span=5min count by host WHERE count > 5

you can learn more about timechart and its options here http://docs.splunk.com/Documentation/Splunk/6.5.0/SearchReference/Timechart#Description

Hope this helps ...

cheers, MuS

0 Karma

tmontney
Builder

Doesn't work like I expect it to. For instance, it listed a host at 12:05 PM today with maxed out usage usage. However, it was just for that minute that it was polled. 12:04 PM and 12:06 PM were very low.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...