Dear Splunkers,
I would like to ask your support in order to adapt my search query to return results if downtime taking specific time window e.g. 3 consecutive days.
May search query is following:
| table _time, status, component_hostname, uptime
| sort by _time asc
| streamstats last(status) AS status by component_hostname
| sort by _time asc
| reverse
| delta uptime AS Duration
| reverse
| eval Duration=abs(round(Duration/60,4))
| search uptime=0
Like this I was able identify components with uptime=0.
Now I would like to extend my query to display result when specific component downtime=0 for several consecutive days e.g. 3 or 2 days.
Thank you
How (in non-SPL terms) do you determine what the downtime for a component is?
Hello, I´ve adjusted my query following:
| bin span=3h _time
| stats values(uptime) AS Uptime BY _time, component_hostname
Like this I will get all Uptimes listed in a span of 3hours by component_hostname. See table
_time |
component_hostname |
uptime |
2024-11-11 15:00 |
router |
0.00000 |
You can see there are results which do include different uptimes e.g. 0..., 1.... or 5....
Now I would like to create an Alert so that it will display only component_hostname which had no different uptime expect of 0 for 1 day.
Thank you
| bin span=3h _time
| stats values(uptime) AS Uptime BY _time, component_hostname
| where Uptime=0
Hello ITW, thank you for reply.
Where Uptime=0 won´t resolve it because during 1 day span some component_hostnames been uptime for few seconds e.g. 1.0000 or 5.0000. This means it can´t be counted as permanent downtime.
My query should be looking only for component_hostnames which had no different Uptime except of 0 in span of 1 day.
Stives
Please give a detailed example of what you want showing why where uptime=0 doesn't work for you.
Hello, see table below please. There are results for components A, B and C:
_time | component_hostname | uptime |
2024-11-11 15:00 | Host A | 0.00000 |
2024-11-11 15:00 | Host B | 0.00000 |
2024-11-11 15:00 | Host C | 0.00000 |
If I apply where uptime=0 my results will look following:
_time | component_hostname | uptime |
2024-11-11 15:00 | Host A | 0.00000 |
2024-11-11 15:00 | Host B | 0.00000 |
2024-11-11 15:00 | Host C | 0.00000 |
But this is not what I need because component A was also showing uptime during my span 1.00000 and 5.00000. Same applies for component B as it was showing uptime 0.00000 and 1.00000. Which means that components A and B where uptime during my span and that is ok. But I´m interested only for components which during the span where showing no other value then 0 e.g. component C. Like this I know that components A and B are responding during my span but component C not responding because its always 0.
Try using the max function instead of values.
| bin span=3h _time
| stats max(uptime) AS Uptime BY _time, component_hostname
| where Uptime=0
Thank you for feedback but yet again this will return uptimes regardless length (0, 1 or more).
If I use where Uptime=0 it shows me uptime lengths taking 0 but it does not necessarily mean there are no 1, 2 or any different lengths while span.
I need my result to return those component_hostnames which had no different length except of 0 nothing else (no 1 or 2 or any different).
This is how I would know component is UP or DOWN during my span.
Please share your full search which is not working for you
| bin span=3h _time
| stats max(uptime) AS Uptime BY _time, component_hostname
| where Uptime=0
There doesn't appear (from what you have shared) to be anything that you are doing wrong
I agree. The combination of stats max(uptime) and where Uptime=0 should show only hosts with zero up time.
Is there something pertinent that is not being shared?
Hello Richgalloway,
thank you for feedback. I´ve managed to set my time window with Uptime results. Now I got issue using my span so that I could see _time and Uptime in seconds in one row only. This I would like to achieve by setting Time picker to last 3 days and I set my span to 72 hours so that Im having one row with all the results.
| bin span=72h _time
My most oldest time should be then always 3 days backwards.
But when I do this my results display also time which is outside of 3 days (see attachement). My oldest results should have end 18.11.24 in the morning but instead it also shows results for 17.11.24. In this case instead of one row I will have 2 rows which will crash my search idea as I need to have one row with the results only.
Why is that can you suggest ? How exactly does span function work ?
You could use the advanced time picker and select earliest as "@d-3d" and latest as "@d" The @d aligns to the beginning of the current day, then the -3d goes back a further 3 days (usually 72h but across daylight saving changes, these may be slightly different. The same may go for the span, so try using 3d rather than 72h.
Adding to @ITWhisperer 's question - remember that if you're detecting a downtime as lack of events you are unable to either detect downtime longer than your search window completely (if you're not using a list of values to compare your results to) or at least unable to detect their real length beyond your search window.
How (in non-SPL terms) do you determine what the downtime for a component is?
Hi ITWhisperer,
downtime represents every value starting with 0,00 do matter how many decimals.
BR
Value 0,00 of which field(s)?
Hi,
I know it's bit confusing but when I run my query field Uptime has value 0,00 by _time. It does not matter how many decimals after 0.