Splunk Search

find servers in 24 hour period that have sustained "% _total Prcessor times" greater than 80% for 5 mins or more

Hudond
Path Finder

Good Afternoon

I am fairly new to splunk and I am trying to figure out the best way to approach this.

I am running the windows TA add on to monitor systems on a small number of servers.

I would like to monitor those servers that could have resource limitations that present with % _total Processor times of greater than 80% within a 24 hour period but with a sustained timing of 5 mins or more.

It is that 5 min window that I am getting caught on.   Most,  if not all servers will present with more than 80% processor times for spurts during use a min or less but it is the sustained processor times that I want to try and capture that could indicate a resource issue.

Any advise or guidance on a good approach to try and capture that information would be appreciated.

Thank you

 

Labels (1)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

You can use streamstats to capture window type behaviour, e.g.

 

| streamstats time_window=5m avg("% _total Prcessor times") as CPUMetric by host
| where CPUMetric>80
| stats values(host)

 

That's using average CPU over the 5 minute window, but you could also use any of the stats aggregations, e.g.

min() to find servers that never reported <80%

perc90() to find servers that reported over 80% for 90% of the reports during the window

and so on. Hope this helps

View solution in original post

0 Karma

DalJeanis
Legend

 

Try something like this:

your search that gets individual ratings in this form 

| fields _time host  "% _total Prcessor times"
| bin _time span=1m 
| stats min("% _total Prcessor times") as min% by _time host 
| eval MyWarning = case(min% > 0.8,1)
| streamstats time_window=301s sum(MyWarning) as Warning5m by host
| where Warning5m > 4


 Modify the MyWarning eval for whatever format your data actually returns in.  80 or 0.80 or whatever.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

You can use streamstats to capture window type behaviour, e.g.

 

| streamstats time_window=5m avg("% _total Prcessor times") as CPUMetric by host
| where CPUMetric>80
| stats values(host)

 

That's using average CPU over the 5 minute window, but you could also use any of the stats aggregations, e.g.

min() to find servers that never reported <80%

perc90() to find servers that reported over 80% for 90% of the reports during the window

and so on. Hope this helps

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...