Solved: Re: How to calculate the availability of an applic...

nravichandran · ‎02-05-2015

I want to calculate availability of an application. The logic i am using is number of errors per minute.
So I am searching by _time and trying to get availability. The result is not returned.

"Error"    | bucket span=1m _time   |  stats  count by _time as t_err | eval avail=86400-t_err |  eval AvailPct = round((avail/86400)*100,2)| timechart span=1m sum(AvailPct)|RENAME sum(AvailPct) as "Avail.Pct"

lguinn2 · ‎02-05-2015

First, I think there is a problem with your math - for each minute, you are calculating the number of errors, and the subtracting that from the number of seconds in a day.

I think you will better off deciding what is "up" and what is "down", and then determining (by minute or second) if the application is available. For that time slot, availability is not a percentage, it is binary (up or down). An availability percentage only makes sense across a time frame, such as a day.

Here is an idea for the chart:

"Error"    
| bucket span=1m _time   
| stats  count by _time as t_err 
| t_err=if(t_err>0,1,0)
| timechart span=1m max(t_err) as status

In this chart, the entire minute is counted as "down" if there were any errors during that minute. If you show this as a bar chart, there will be a spike on the bar for each minute where the application was "down".

To calculate the availability percentage by day:

"Error"    
| bucket span=1s _time   
| stats  count by _time as t_err 
| t_err=if(t_err>0,1,0)
| bucket span=1d _time
| stats sum(t_err) as totalSecsDown by _time
| eval Percent_Available = round((86400-totalSecsDown)*100/86400,2)
| timechart span=1d max(Percent_Available) as Avail.Pct

This calculates an availability percentage by day, based on the number of seconds down.

Note that in both cases, I defined t_err to be "1" if there are any errors. That way, when Splunk adds up t_err, it is the number of seconds (or minutes), not the number of errors.

View solution in original post

aholzer · ‎02-05-2015

Also, rather than trying to use rename I suggest you use "AS" inside of the timechart command itself. Like so:

"Error"    | bucket span=1m _time | stats count by _time as t_err | rename _time as t_err | eval avail=86400-t_err | eval AvailPct = round((avail/86400)*100,2)| timechart span=1m sum(AvailPct) as "Avail.Pct"

lguinn2 · ‎02-05-2015

First, I think there is a problem with your math - for each minute, you are calculating the number of errors, and the subtracting that from the number of seconds in a day.

I think you will better off deciding what is "up" and what is "down", and then determining (by minute or second) if the application is available. For that time slot, availability is not a percentage, it is binary (up or down). An availability percentage only makes sense across a time frame, such as a day.

Here is an idea for the chart:

"Error"    
| bucket span=1m _time   
| stats  count by _time as t_err 
| t_err=if(t_err>0,1,0)
| timechart span=1m max(t_err) as status

In this chart, the entire minute is counted as "down" if there were any errors during that minute. If you show this as a bar chart, there will be a spike on the bar for each minute where the application was "down".

To calculate the availability percentage by day:

"Error"    
| bucket span=1s _time   
| stats  count by _time as t_err 
| t_err=if(t_err>0,1,0)
| bucket span=1d _time
| stats sum(t_err) as totalSecsDown by _time
| eval Percent_Available = round((86400-totalSecsDown)*100/86400,2)
| timechart span=1d max(Percent_Available) as Avail.Pct

This calculates an availability percentage by day, based on the number of seconds down.

Note that in both cases, I defined t_err to be "1" if there are any errors. That way, when Splunk adds up t_err, it is the number of seconds (or minutes), not the number of errors.

nravichandran · ‎02-05-2015

Thanks. When i used the second one (which is what i am looking for) i got error and modified by adding eval but did not get any results as chart but results are returned in the events, no visuvalization or stats.

How to calculate the availability of an application using the number of errors per minute?

Can’t make it to .conf25? Join us online!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Unlock What’s Next: The Splunk Cloud Platform at .conf25

Are you a member of the Splunk Community?

How to calculate the availability of an application using the number of errors per minute?

Can’t make it to .conf25? Join us online!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Unlock What’s Next: The Splunk Cloud Platform at .conf25