Example: Any host in the index exceeds 50% CPU usage for 5 minutes or more. So essentially, I need an alert when 5 events come in within a 5 minute span, all with a value above 50 from the same host.
To be clear: This does not mean that CPU usage can exceed 50% at 1 PM and exceed 50% at 1:05 PM, but not exceed 50% at 1:01-1:04 PM. In my case, I poll CPU usage once every minute. So, I'll have 5 events in 5 minutes. (Whether or not it dropped within a minute but came back is not of concern.)
Try like this
search:
your base search here host=* cpuusagefield>=50 |stats count by host | where count>=5
Time range:
Start time: -6m@m End time: -1m@m
Cron Schedule: (runs every 5min starting from 1st min, 1,6,11,16,21,26...)
1-59/5 * * * *
Strangely with enough tinkering, we both came up with similar search queries. I ran it and search returned results with hosts that had less than 5 events. For instance, in the last 5 minutes I get a PC with 2 results.
5:16:54 and 5:19:03, both falling within the target usage value. Then I decided to do a query in general for that PC. I have Splunk to query for cpu usage every 60 seconds for about 40 PCs. It also queries every 60 seconds for amount of RAM free, and disk idle time (for those same 40 PCs). To my surprise, I see 12 events for a 15 minute window (a query looking for this PC, in the perfmon index for any entries regarding cpu usage). I should see 15 events (or possibly 14). For instance, 5:16:54 and 5:14:21 are the last two events. Why is there over a 2 minute gap? splunkd log doesn't say anything (no errors regarding this PC).
Hi @tmontney,
This sounds like a real-time alert with rolling window triggering. You can review documentation for setting up this alert type here:
http://docs.splunk.com/Documentation/Splunk/6.5.0/Alert/DefineRealTimeAlerts#Create_a_real-time_aler...
There's an example here:
http://docs.splunk.com/Documentation/Splunk/6.5.0/Alert/Alertexamples#Real-time_alert_example
And here's a general comparison of alert types:
http://docs.splunk.com/Documentation/Splunk/6.5.0/Alert/AlertTypesOverview
Hope this helps!
Hi tmontney,
If I understand your request correct you can try something like this:
your base search here host=* cpuusagefield>=50 | timechart span=5min count by host WHERE count > 5
you can learn more about timechart
and its options here http://docs.splunk.com/Documentation/Splunk/6.5.0/SearchReference/Timechart#Description
Hope this helps ...
cheers, MuS
Doesn't work like I expect it to. For instance, it listed a host at 12:05 PM today with maxed out usage usage. However, it was just for that minute that it was polled. 12:04 PM and 12:06 PM were very low.
you might have to try any one of these use cases below...