Getting Data In

Alerting on number of times something happens in a timeframe

Michael_Wilde
Splunk Employee
Splunk Employee

I'm monitoring CPU usage on a Windows server. What's the best way to create a search/alert if CPU usage goes over 80% for 30 minutes (as an example)

Tags (2)

Lowell
Super Champion

You could use the transaction command with eval-based start/end conditions. It seems like there could be a better way to do this (I'm curious too), but this approach does seems to work based on a simple test that I ran. (But there could be corner cases, I'm not sure.)

sourcetype="WMI:CPUTime" | transaction host startswith=eval(PercentProcessorTime>=80) endswith=eval(PercentProcessorTime<80) | where duration>=1800

One questions I have is this: Do you want to include in your results a situation where the CPU is say running at 90% for 15 minutes, then it drops to 70% for less that a minute (just long enough for 1 WMI snapshot; perhaps due to a blocking condition) and then returns to 90% for another 20 minutes. Certainly this would seem to fit into the general criteria of what you are trying to find, but but it wouldn't technically match. Some kind of weighted average approach would probably allow this situation to be captured.

Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...