Good Afternoon
I am fairly new to splunk and I am trying to figure out the best way to approach this.
I am running the windows TA add on to monitor systems on a small number of servers.
I would like to monitor those servers that could have resource limitations that present with % _total Processor times of greater than 80% within a 24 hour period but with a sustained timing of 5 mins or more.
It is that 5 min window that I am getting caught on. Most, if not all servers will present with more than 80% processor times for spurts during use a min or less but it is the sustained processor times that I want to try and capture that could indicate a resource issue.
Any advise or guidance on a good approach to try and capture that information would be appreciated.
Thank you
You can use streamstats to capture window type behaviour, e.g.
| streamstats time_window=5m avg("% _total Prcessor times") as CPUMetric by host
| where CPUMetric>80
| stats values(host)
That's using average CPU over the 5 minute window, but you could also use any of the stats aggregations, e.g.
min() to find servers that never reported <80%
perc90() to find servers that reported over 80% for 90% of the reports during the window
and so on. Hope this helps
Try something like this:
your search that gets individual ratings in this form
| fields _time host "% _total Prcessor times"
| bin _time span=1m
| stats min("% _total Prcessor times") as min% by _time host
| eval MyWarning = case(min% > 0.8,1)
| streamstats time_window=301s sum(MyWarning) as Warning5m by host
| where Warning5m > 4
Modify the MyWarning eval for whatever format your data actually returns in. 80 or 0.80 or whatever.
You can use streamstats to capture window type behaviour, e.g.
| streamstats time_window=5m avg("% _total Prcessor times") as CPUMetric by host
| where CPUMetric>80
| stats values(host)
That's using average CPU over the 5 minute window, but you could also use any of the stats aggregations, e.g.
min() to find servers that never reported <80%
perc90() to find servers that reported over 80% for 90% of the reports during the window
and so on. Hope this helps