Solved: Re: find servers in 24 hour period that have susta...

Hudond · ‎06-15-2020

Good Afternoon

I am fairly new to splunk and I am trying to figure out the best way to approach this.

I am running the windows TA add on to monitor systems on a small number of servers.

I would like to monitor those servers that could have resource limitations that present with % _total Processor times of greater than 80% within a 24 hour period but with a sustained timing of 5 mins or more.

It is that 5 min window that I am getting caught on. Most, if not all servers will present with more than 80% processor times for spurts during use a min or less but it is the sustained processor times that I want to try and capture that could indicate a resource issue.

Any advise or guidance on a good approach to try and capture that information would be appreciated.

Thank you

bowesmana · ‎06-15-2020

You can use streamstats to capture window type behaviour, e.g.

| streamstats time_window=5m avg("% _total Prcessor times") as CPUMetric by host
| where CPUMetric>80
| stats values(host)

That's using average CPU over the 5 minute window, but you could also use any of the stats aggregations, e.g.

min() to find servers that never reported <80%

perc90() to find servers that reported over 80% for 90% of the reports during the window

and so on. Hope this helps

View solution in original post

DalJeanis · ‎06-16-2020

Try something like this:

your search that gets individual ratings in this form 

| fields _time host  "% _total Prcessor times"
| bin _time span=1m 
| stats min("% _total Prcessor times") as min% by _time host 
| eval MyWarning = case(min% > 0.8,1)
| streamstats time_window=301s sum(MyWarning) as Warning5m by host
| where Warning5m > 4

Modify the MyWarning eval for whatever format your data actually returns in. 80 or 0.80 or whatever.

bowesmana · ‎06-15-2020

You can use streamstats to capture window type behaviour, e.g.

| streamstats time_window=5m avg("% _total Prcessor times") as CPUMetric by host
| where CPUMetric>80
| stats values(host)

That's using average CPU over the 5 minute window, but you could also use any of the stats aggregations, e.g.

min() to find servers that never reported <80%

perc90() to find servers that reported over 80% for 90% of the reports during the window

and so on. Hope this helps

find servers in 24 hour period that have sustained "% _total Prcessor times" greater than 80% for 5 mins or more

stats

Fun with Regular Expression - multiples of nine

[Live Demo] Watch SOC transformation in action with the reimagined Splunk Enterprise ...

What’s New & Next in Splunk SOAR

Are you a member of the Splunk Community?

find servers in 24 hour period that have sustained "% _total Prcessor times" greater than 80% for 5 mins or more

stats

Fun with Regular Expression - multiples of nine

[Live Demo] Watch SOC transformation in action with the reimagined Splunk Enterprise ...

What’s New & Next in Splunk SOAR