Alerting

How to create an alert to trigger when a host exceeds a certain value over X number minutes?

tmontney
Builder

Example: Any host in the index exceeds 50% CPU usage for 5 minutes or more. So essentially, I need an alert when 5 events come in within a 5 minute span, all with a value above 50 from the same host.

To be clear: This does not mean that CPU usage can exceed 50% at 1 PM and exceed 50% at 1:05 PM, but not exceed 50% at 1:01-1:04 PM. In my case, I poll CPU usage once every minute. So, I'll have 5 events in 5 minutes. (Whether or not it dropped within a minute but came back is not of concern.)

0 Karma

somesoni2
Revered Legend

Try like this

search:

your base search here host=* cpuusagefield>=50 |stats count by host | where count>=5

Time range:

Start time: -6m@m   End time: -1m@m

Cron Schedule: (runs every 5min starting from 1st min, 1,6,11,16,21,26...)

1-59/5 * * * *
0 Karma

tmontney
Builder

Strangely with enough tinkering, we both came up with similar search queries. I ran it and search returned results with hosts that had less than 5 events. For instance, in the last 5 minutes I get a PC with 2 results.

5:16:54 and 5:19:03, both falling within the target usage value. Then I decided to do a query in general for that PC. I have Splunk to query for cpu usage every 60 seconds for about 40 PCs. It also queries every 60 seconds for amount of RAM free, and disk idle time (for those same 40 PCs). To my surprise, I see 12 events for a 15 minute window (a query looking for this PC, in the perfmon index for any entries regarding cpu usage). I should see 15 events (or possibly 14). For instance, 5:16:54 and 5:14:21 are the last two events. Why is there over a 2 minute gap? splunkd log doesn't say anything (no errors regarding this PC).

0 Karma

frobinson_splun
Splunk Employee
Splunk Employee

Hi @tmontney,
This sounds like a real-time alert with rolling window triggering. You can review documentation for setting up this alert type here:
http://docs.splunk.com/Documentation/Splunk/6.5.0/Alert/DefineRealTimeAlerts#Create_a_real-time_aler...

There's an example here:
http://docs.splunk.com/Documentation/Splunk/6.5.0/Alert/Alertexamples#Real-time_alert_example

And here's a general comparison of alert types:
http://docs.splunk.com/Documentation/Splunk/6.5.0/Alert/AlertTypesOverview

Hope this helps!

MuS
SplunkTrust
SplunkTrust

Hi tmontney,

If I understand your request correct you can try something like this:

  your base search here host=* cpuusagefield>=50 | timechart span=5min count by host WHERE count > 5

you can learn more about timechart and its options here http://docs.splunk.com/Documentation/Splunk/6.5.0/SearchReference/Timechart#Description

Hope this helps ...

cheers, MuS

0 Karma

tmontney
Builder

Doesn't work like I expect it to. For instance, it listed a host at 12:05 PM today with maxed out usage usage. However, it was just for that minute that it was polled. 12:04 PM and 12:06 PM were very low.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...