Splunk Search

find servers in 24 hour period that have sustained "% _total Prcessor times" greater than 80% for 5 mins or more

Hudond
Path Finder

Good Afternoon

I am fairly new to splunk and I am trying to figure out the best way to approach this.

I am running the windows TA add on to monitor systems on a small number of servers.

I would like to monitor those servers that could have resource limitations that present with % _total Processor times of greater than 80% within a 24 hour period but with a sustained timing of 5 mins or more.

It is that 5 min window that I am getting caught on.   Most,  if not all servers will present with more than 80% processor times for spurts during use a min or less but it is the sustained processor times that I want to try and capture that could indicate a resource issue.

Any advise or guidance on a good approach to try and capture that information would be appreciated.

Thank you

 

Labels (1)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

You can use streamstats to capture window type behaviour, e.g.

 

| streamstats time_window=5m avg("% _total Prcessor times") as CPUMetric by host
| where CPUMetric>80
| stats values(host)

 

That's using average CPU over the 5 minute window, but you could also use any of the stats aggregations, e.g.

min() to find servers that never reported <80%

perc90() to find servers that reported over 80% for 90% of the reports during the window

and so on. Hope this helps

View solution in original post

0 Karma

DalJeanis
Legend

 

Try something like this:

your search that gets individual ratings in this form 

| fields _time host  "% _total Prcessor times"
| bin _time span=1m 
| stats min("% _total Prcessor times") as min% by _time host 
| eval MyWarning = case(min% > 0.8,1)
| streamstats time_window=301s sum(MyWarning) as Warning5m by host
| where Warning5m > 4


 Modify the MyWarning eval for whatever format your data actually returns in.  80 or 0.80 or whatever.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

You can use streamstats to capture window type behaviour, e.g.

 

| streamstats time_window=5m avg("% _total Prcessor times") as CPUMetric by host
| where CPUMetric>80
| stats values(host)

 

That's using average CPU over the 5 minute window, but you could also use any of the stats aggregations, e.g.

min() to find servers that never reported <80%

perc90() to find servers that reported over 80% for 90% of the reports during the window

and so on. Hope this helps

0 Karma
Get Updates on the Splunk Community!

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

  &#x1f680; Your data just got a serious AI upgrade — are you ready? Say hello to the Agentic Era with the ...

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Accelerating Observability as Code with the Splunk AI Assistant

We’ve seen in previous posts what Observability as Code (OaC) is and how it’s now essential for managing ...