Splunk Search

find servers in 24 hour period that have sustained "% _total Prcessor times" greater than 80% for 5 mins or more

Hudond
Path Finder

Good Afternoon

I am fairly new to splunk and I am trying to figure out the best way to approach this.

I am running the windows TA add on to monitor systems on a small number of servers.

I would like to monitor those servers that could have resource limitations that present with % _total Processor times of greater than 80% within a 24 hour period but with a sustained timing of 5 mins or more.

It is that 5 min window that I am getting caught on.   Most,  if not all servers will present with more than 80% processor times for spurts during use a min or less but it is the sustained processor times that I want to try and capture that could indicate a resource issue.

Any advise or guidance on a good approach to try and capture that information would be appreciated.

Thank you

 

Labels (1)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

You can use streamstats to capture window type behaviour, e.g.

 

| streamstats time_window=5m avg("% _total Prcessor times") as CPUMetric by host
| where CPUMetric>80
| stats values(host)

 

That's using average CPU over the 5 minute window, but you could also use any of the stats aggregations, e.g.

min() to find servers that never reported <80%

perc90() to find servers that reported over 80% for 90% of the reports during the window

and so on. Hope this helps

View solution in original post

0 Karma

DalJeanis
Legend

 

Try something like this:

your search that gets individual ratings in this form 

| fields _time host  "% _total Prcessor times"
| bin _time span=1m 
| stats min("% _total Prcessor times") as min% by _time host 
| eval MyWarning = case(min% > 0.8,1)
| streamstats time_window=301s sum(MyWarning) as Warning5m by host
| where Warning5m > 4


 Modify the MyWarning eval for whatever format your data actually returns in.  80 or 0.80 or whatever.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

You can use streamstats to capture window type behaviour, e.g.

 

| streamstats time_window=5m avg("% _total Prcessor times") as CPUMetric by host
| where CPUMetric>80
| stats values(host)

 

That's using average CPU over the 5 minute window, but you could also use any of the stats aggregations, e.g.

min() to find servers that never reported <80%

perc90() to find servers that reported over 80% for 90% of the reports during the window

and so on. Hope this helps

0 Karma
Get Updates on the Splunk Community!

Splunk is Nurturing Tomorrow’s Cybersecurity Leaders Today

Meet Carol Wright. She leads the Splunk Academic Alliance program at Splunk. The Splunk Academic Alliance ...

Part 2: A Guide to Maximizing Splunk IT Service Intelligence

Welcome to the second segment of our guide. In Part 1, we covered the essentials of getting started with ITSI ...

Part 1: A Guide to Maximizing Splunk IT Service Intelligence

As modern IT environments continue to grow in complexity and speed, the ability to efficiently manage and ...