Hi guys. I want to be able to calculate downtime based on the amount of requests that an Application server processes. The downtime is calculated based on the following rules.
Below is an example of the result I want to calculate downtime on:
Here is my method to get the top 80% count, using the percentile top 80% counts, and qualify every minute as up or downtime based on this value.
index=_internal source=*web* req_time =* | bucket _time span=1m | stats count by _time | eventstats perc80(count) AS maxperc80 | eval status=if(count < maxperc80, "down", "up")
You probably want to add some sort of count of consecutive durations and exclude the outliers
Then do the sum of the "down" minutes.
| stats count by status
I know this is not what you are asking but, based on your example which shows an obvious 100% (full vs. partial) outage, why would you not use something like this:
... | streamstats current=f latest(_time) AS prevEventTime latest(_raw) AS prevEvent | eval downtime = _time - _prevEventTime | where downtime > 100