Getting Data In

Alerting on number of times something happens in a timeframe

Splunk Employee
Splunk Employee

I'm monitoring CPU usage on a Windows server. What's the best way to create a search/alert if CPU usage goes over 80% for 30 minutes (as an example)

Tags (2)

Super Champion

You could use the transaction command with eval-based start/end conditions. It seems like there could be a better way to do this (I'm curious too), but this approach does seems to work based on a simple test that I ran. (But there could be corner cases, I'm not sure.)

sourcetype="WMI:CPUTime" | transaction host startswith=eval(PercentProcessorTime>=80) endswith=eval(PercentProcessorTime<80) | where duration>=1800

One questions I have is this: Do you want to include in your results a situation where the CPU is say running at 90% for 15 minutes, then it drops to 70% for less that a minute (just long enough for 1 WMI snapshot; perhaps due to a blocking condition) and then returns to 90% for another 20 minutes. Certainly this would seem to fit into the general criteria of what you are trying to find, but but it wouldn't technically match. Some kind of weighted average approach would probably allow this situation to be captured.

Get Updates on the Splunk Community!

.conf23 Registration is Now Open!

Time to toss the .conf-etti &#x1f389; —  .conf23 registration is open!   Join us in Las Vegas July 17-20 for ...

Don't wait! Accept the Mission Possible: Splunk Adoption Challenge Now and Win ...

Attention everyone! We have exciting news to share! We are recruiting new members for the Mission Possible: ...

Unify Your SecOps with Splunk Mission Control

In today’s post, I'm excited to share some recent Splunk Mission Control innovations. With Splunk Mission ...