I'm looking into creating equal availability across the board for different applications that are all being tested by the same tool.
Because the tool that tests availability can be set to different intervals and can have multiple tests running against the same service, I need to normalise the data so the calculation can take into account different services and the tests beneath them.
The flow of my idea is this:
1. Break the search time period down into minute buckets (shortest interval time)
2. Fill in empty buckets with the contents of the previous bucket
3. Any buckets with multiple events need to be singled down to one result, if there is at least 1 fail in there then the bucket is a fail.
4. calculate the availability for multiple services
So far I have:
source=*** (service=*** OR service=***)
| bucket span=m _time
| stats values(result) AS partResult by _time service
| eval finalResult=if(partResult="fail","fail",partResult)
| timechart span=m count(eval(finalResult="fail")) as Failures BY service
| filldown
Now this gives me whether the service is up or down at a specific minute and so it wouldn't matter if there are different run intervals.
The problem I have is that I cannot sum the total of failures for the time period so I can calculate a percentage against the time period selected.
The reason I have used Timechart is because it was the only way I knew of how to get every minute as a bucket in the selected time period.
Not sure I get the picture of what your final output should look like. Could you please provide your expected output, in tabular form may be, that you need (for the sum of total of failures)?
Following suggestion is with some assumptions, give this a try
Updated
Fixed eval command and adjusted to expected output.
source=*** (service=*** OR service=***)
| bucket span=m _time
| stats count values(result) AS partResult by _time service
| eval finalResult=if(isnotnull(mvfilter(match(partResult,"fail"))),1,0)
| timechart span=m sum(finalResult) as Failures BY service
| filldown
| untable _time service failures
| stats dc(_time) as TotalMinutes sum(failures) as TotalFailures by service
| eval Availability=100- round(TotalFailures*100/TotalMinutes,2)
UPDATE:
Output should look like:
ServiceA, 100%
ServiceB, 99.92%
ServiceC, 100%
Not sure I get the picture of what your final output should look like. Could you please provide your expected output, in tabular form may be, that you need (for the sum of total of failures)?
Following suggestion is with some assumptions, give this a try
Updated
Fixed eval command and adjusted to expected output.
source=*** (service=*** OR service=***)
| bucket span=m _time
| stats count values(result) AS partResult by _time service
| eval finalResult=if(isnotnull(mvfilter(match(partResult,"fail"))),1,0)
| timechart span=m sum(finalResult) as Failures BY service
| filldown
| untable _time service failures
| stats dc(_time) as TotalMinutes sum(failures) as TotalFailures by service
| eval Availability=100- round(TotalFailures*100/TotalMinutes,2)
So trying this, I get an error with the if statement.
I need to be able to calculate availability of 1 or more services separately. So that ServiceA is completely different to ServiceB as seen in my update comment.
My algorithm to work it out was 100 - ( ( totalMinuteFailuresPerService / searchPeriodInMinutes ) * 100 )
Try the updated answer
One addition I would add to this is: How can I manipulate the data so it would show it on a day by day basis see example.
Day ServiceA ServiceB
19/09 100% 100%
20/09 100% 100%
21/09 100% 100%
22/09 99.95% 100%
The main time range this will be done in is "Previous Week"
Try like this
source=*** (service=*** OR service=***)
| bucket span=m _time
| stats count values(result) AS partResult by _time service
| eval finalResult=if(isnotnull(mvfilter(match(partResult,"fail"))),1,0)
| timechart span=m sum(finalResult) as Failures BY service
| filldown
| untable _time service failures | eval day=strftime(_time,"%m/%d/%Y")
| stats dc(_time) as TotalMinutes sum(failures) as TotalFailures by day service
| eval Availability=100- round(TotalFailures*100/TotalMinutes,2)
| chart values(Availability) over day by service
That's perfect, Thank you again!
This is a great solution! Thank you for your help