Splunk Search

How to modify my search to calculate availability of multiple applications?

MattLingwood
Engager

I'm looking into creating equal availability across the board for different applications that are all being tested by the same tool.
Because the tool that tests availability can be set to different intervals and can have multiple tests running against the same service, I need to normalise the data so the calculation can take into account different services and the tests beneath them.

The flow of my idea is this:
1. Break the search time period down into minute buckets (shortest interval time)
2. Fill in empty buckets with the contents of the previous bucket
3. Any buckets with multiple events need to be singled down to one result, if there is at least 1 fail in there then the bucket is a fail.
4. calculate the availability for multiple services

So far I have:

source=*** (service=*** OR service=***)
| bucket span=m _time                           
| stats values(result) AS partResult by _time service
| eval finalResult=if(partResult="fail","fail",partResult)
| timechart span=m count(eval(finalResult="fail")) as Failures BY service
| filldown

Now this gives me whether the service is up or down at a specific minute and so it wouldn't matter if there are different run intervals.

The problem I have is that I cannot sum the total of failures for the time period so I can calculate a percentage against the time period selected.

The reason I have used Timechart is because it was the only way I knew of how to get every minute as a bucket in the selected time period.

0 Karma
1 Solution

somesoni2
Revered Legend

Not sure I get the picture of what your final output should look like. Could you please provide your expected output, in tabular form may be, that you need (for the sum of total of failures)?

Following suggestion is with some assumptions, give this a try

Updated
Fixed eval command and adjusted to expected output.

 source=*** (service=*** OR service=***)
 | bucket span=m _time                           
 | stats count values(result) AS partResult by _time service
 | eval finalResult=if(isnotnull(mvfilter(match(partResult,"fail"))),1,0)
 | timechart span=m sum(finalResult) as Failures BY service
 | filldown
 | untable _time service failures
 | stats dc(_time) as TotalMinutes sum(failures) as TotalFailures by service 
 | eval Availability=100- round(TotalFailures*100/TotalMinutes,2)

View solution in original post

MattLingwood
Engager

UPDATE:
Output should look like:
ServiceA, 100%
ServiceB, 99.92%
ServiceC, 100%

0 Karma

somesoni2
Revered Legend

Not sure I get the picture of what your final output should look like. Could you please provide your expected output, in tabular form may be, that you need (for the sum of total of failures)?

Following suggestion is with some assumptions, give this a try

Updated
Fixed eval command and adjusted to expected output.

 source=*** (service=*** OR service=***)
 | bucket span=m _time                           
 | stats count values(result) AS partResult by _time service
 | eval finalResult=if(isnotnull(mvfilter(match(partResult,"fail"))),1,0)
 | timechart span=m sum(finalResult) as Failures BY service
 | filldown
 | untable _time service failures
 | stats dc(_time) as TotalMinutes sum(failures) as TotalFailures by service 
 | eval Availability=100- round(TotalFailures*100/TotalMinutes,2)

MattLingwood
Engager

So trying this, I get an error with the if statement.

I need to be able to calculate availability of 1 or more services separately. So that ServiceA is completely different to ServiceB as seen in my update comment.

My algorithm to work it out was 100 - ( ( totalMinuteFailuresPerService / searchPeriodInMinutes ) * 100 )

0 Karma

somesoni2
Revered Legend

Try the updated answer

0 Karma

MattLingwood
Engager

One addition I would add to this is: How can I manipulate the data so it would show it on a day by day basis see example.

Day ServiceA ServiceB
19/09 100% 100%
20/09 100% 100%
21/09 100% 100%
22/09 99.95% 100%

The main time range this will be done in is "Previous Week"

0 Karma

somesoni2
Revered Legend

Try like this

source=*** (service=*** OR service=***)
  | bucket span=m _time                           
  | stats count values(result) AS partResult by _time service
  | eval finalResult=if(isnotnull(mvfilter(match(partResult,"fail"))),1,0)
  | timechart span=m sum(finalResult) as Failures BY service
  | filldown
  | untable _time service failures | eval day=strftime(_time,"%m/%d/%Y")
  | stats dc(_time) as TotalMinutes sum(failures) as TotalFailures by day service 
  | eval Availability=100- round(TotalFailures*100/TotalMinutes,2)
  | chart values(Availability) over day by service
0 Karma

MattLingwood
Engager

That's perfect, Thank you again!

0 Karma

MattLingwood
Engager

This is a great solution! Thank you for your help

0 Karma
Get Updates on the Splunk Community!

Video | Welcome Back to Smartness, Pedro

Remember Splunk Community member, Pedro Borges? If you tuned into Episode 2 of our Smartness interview series, ...

Detector Best Practices: Static Thresholds

Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within ...

Expert Tips from Splunk Education, Observability in Action, Plus More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...