Getting Data In

Alerting on number of times something happens in a timeframe

Splunk Employee
Splunk Employee

I'm monitoring CPU usage on a Windows server. What's the best way to create a search/alert if CPU usage goes over 80% for 30 minutes (as an example)

Tags (2)

Super Champion

You could use the transaction command with eval-based start/end conditions. It seems like there could be a better way to do this (I'm curious too), but this approach does seems to work based on a simple test that I ran. (But there could be corner cases, I'm not sure.)

sourcetype="WMI:CPUTime" | transaction host startswith=eval(PercentProcessorTime>=80) endswith=eval(PercentProcessorTime<80) | where duration>=1800

One questions I have is this: Do you want to include in your results a situation where the CPU is say running at 90% for 15 minutes, then it drops to 70% for less that a minute (just long enough for 1 WMI snapshot; perhaps due to a blocking condition) and then returns to 90% for another 20 minutes. Certainly this would seem to fit into the general criteria of what you are trying to find, but but it wouldn't technically match. Some kind of weighted average approach would probably allow this situation to be captured.

Get Updates on the Splunk Community!

Observability Highlights | January 2023 Newsletter

 January 2023New Product Releases Splunk Network Explorer for Infrastructure MonitoringSplunk unveils Network ...

Security Highlights | January 2023 Newsletter

January 2023 Splunk Security Essentials (SSE) 3.7.0 ReleaseThe free Splunk Security Essentials (SSE) 3.7.0 app ...

Platform Highlights | January 2023 Newsletter

 January 2023Peace on Earth and Peace of Mind With Business ResilienceAll organizations can start the new year ...