Solved: How to modify my search to calculate availability ...

MattLingwood · ‎09-23-2016

I'm looking into creating equal availability across the board for different applications that are all being tested by the same tool.
Because the tool that tests availability can be set to different intervals and can have multiple tests running against the same service, I need to normalise the data so the calculation can take into account different services and the tests beneath them.

The flow of my idea is this:
1. Break the search time period down into minute buckets (shortest interval time)
2. Fill in empty buckets with the contents of the previous bucket
3. Any buckets with multiple events need to be singled down to one result, if there is at least 1 fail in there then the bucket is a fail.
4. calculate the availability for multiple services

So far I have:

source=*** (service=*** OR service=***)
| bucket span=m _time                           
| stats values(result) AS partResult by _time service
| eval finalResult=if(partResult="fail","fail",partResult)
| timechart span=m count(eval(finalResult="fail")) as Failures BY service
| filldown

Now this gives me whether the service is up or down at a specific minute and so it wouldn't matter if there are different run intervals.

The problem I have is that I cannot sum the total of failures for the time period so I can calculate a percentage against the time period selected.

The reason I have used Timechart is because it was the only way I knew of how to get every minute as a bucket in the selected time period.

somesoni2 · ‎09-23-2016

Not sure I get the picture of what your final output should look like. Could you please provide your expected output, in tabular form may be, that you need (for the sum of total of failures)?

Following suggestion is with some assumptions, give this a try

Updated
Fixed eval command and adjusted to expected output.

 source=*** (service=*** OR service=***)
 | bucket span=m _time                           
 | stats count values(result) AS partResult by _time service
 | eval finalResult=if(isnotnull(mvfilter(match(partResult,"fail"))),1,0)
 | timechart span=m sum(finalResult) as Failures BY service
 | filldown
 | untable _time service failures
 | stats dc(_time) as TotalMinutes sum(failures) as TotalFailures by service 
 | eval Availability=100- round(TotalFailures*100/TotalMinutes,2)

View solution in original post

MattLingwood · ‎09-23-2016

UPDATE:
Output should look like:
ServiceA, 100%
ServiceB, 99.92%
ServiceC, 100%

somesoni2 · ‎09-23-2016

Not sure I get the picture of what your final output should look like. Could you please provide your expected output, in tabular form may be, that you need (for the sum of total of failures)?

Following suggestion is with some assumptions, give this a try

Updated
Fixed eval command and adjusted to expected output.

 source=*** (service=*** OR service=***)
 | bucket span=m _time                           
 | stats count values(result) AS partResult by _time service
 | eval finalResult=if(isnotnull(mvfilter(match(partResult,"fail"))),1,0)
 | timechart span=m sum(finalResult) as Failures BY service
 | filldown
 | untable _time service failures
 | stats dc(_time) as TotalMinutes sum(failures) as TotalFailures by service 
 | eval Availability=100- round(TotalFailures*100/TotalMinutes,2)

MattLingwood · ‎09-23-2016

So trying this, I get an error with the if statement.

I need to be able to calculate availability of 1 or more services separately. So that ServiceA is completely different to ServiceB as seen in my update comment.

My algorithm to work it out was 100 - ( ( totalMinuteFailuresPerService / searchPeriodInMinutes ) * 100 )

somesoni2 · ‎09-23-2016

Try the updated answer

MattLingwood · ‎09-26-2016

One addition I would add to this is: How can I manipulate the data so it would show it on a day by day basis see example.

Day ServiceA ServiceB
19/09 100% 100%
20/09 100% 100%
21/09 100% 100%
22/09 99.95% 100%

The main time range this will be done in is "Previous Week"

somesoni2 · ‎09-26-2016

Try like this

source=*** (service=*** OR service=***)
  | bucket span=m _time                           
  | stats count values(result) AS partResult by _time service
  | eval finalResult=if(isnotnull(mvfilter(match(partResult,"fail"))),1,0)
  | timechart span=m sum(finalResult) as Failures BY service
  | filldown
  | untable _time service failures | eval day=strftime(_time,"%m/%d/%Y")
  | stats dc(_time) as TotalMinutes sum(failures) as TotalFailures by day service 
  | eval Availability=100- round(TotalFailures*100/TotalMinutes,2)
  | chart values(Availability) over day by service

MattLingwood · ‎09-26-2016

That's perfect, Thank you again!

MattLingwood · ‎09-26-2016

This is a great solution! Thank you for your help

How to modify my search to calculate availability of multiple applications?

Splunk Observability for AI

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability as Code: From Zero to Dashboard

Are you a member of the Splunk Community?

How to modify my search to calculate availability of multiple applications?

Splunk Observability for AI

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Splunk Observability as Code: From Zero to Dashboard