Splunk Search

find servers in 24 hour period that have sustained "% _total Prcessor times" greater than 80% for 5 mins or more

Hudond
Path Finder

Good Afternoon

I am fairly new to splunk and I am trying to figure out the best way to approach this.

I am running the windows TA add on to monitor systems on a small number of servers.

I would like to monitor those servers that could have resource limitations that present with % _total Processor times of greater than 80% within a 24 hour period but with a sustained timing of 5 mins or more.

It is that 5 min window that I am getting caught on.   Most,  if not all servers will present with more than 80% processor times for spurts during use a min or less but it is the sustained processor times that I want to try and capture that could indicate a resource issue.

Any advise or guidance on a good approach to try and capture that information would be appreciated.

Thank you

 

Labels (1)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

You can use streamstats to capture window type behaviour, e.g.

 

| streamstats time_window=5m avg("% _total Prcessor times") as CPUMetric by host
| where CPUMetric>80
| stats values(host)

 

That's using average CPU over the 5 minute window, but you could also use any of the stats aggregations, e.g.

min() to find servers that never reported <80%

perc90() to find servers that reported over 80% for 90% of the reports during the window

and so on. Hope this helps

View solution in original post

0 Karma

DalJeanis
Legend

 

Try something like this:

your search that gets individual ratings in this form 

| fields _time host  "% _total Prcessor times"
| bin _time span=1m 
| stats min("% _total Prcessor times") as min% by _time host 
| eval MyWarning = case(min% > 0.8,1)
| streamstats time_window=301s sum(MyWarning) as Warning5m by host
| where Warning5m > 4


 Modify the MyWarning eval for whatever format your data actually returns in.  80 or 0.80 or whatever.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

You can use streamstats to capture window type behaviour, e.g.

 

| streamstats time_window=5m avg("% _total Prcessor times") as CPUMetric by host
| where CPUMetric>80
| stats values(host)

 

That's using average CPU over the 5 minute window, but you could also use any of the stats aggregations, e.g.

min() to find servers that never reported <80%

perc90() to find servers that reported over 80% for 90% of the reports during the window

and so on. Hope this helps

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

&#x1f342; Fall into November with a fresh lineup of Community Office Hours, Tech Talks, and Webinars we’ve ...

Transform your security operations with Splunk Enterprise Security

Hi Splunk Community, Splunk Platform has set a great foundation for your security operations. With the ...

Splunk Admins and App Developers | Earn a $35 gift card!

Splunk, in collaboration with ESG (Enterprise Strategy Group) by TechTarget, is excited to announce a ...