Splunk Enterprise

Search query for Downtime taking several days

Stives
Explorer

Dear Splunkers,

I would like to ask your support in order to adapt my search query to return results if downtime taking specific time window e.g. 3 consecutive days.
May search query is following:

| table _time, status, component_hostname, uptime
| sort by _time asc
| streamstats last(status) AS status by component_hostname
| sort by _time asc
| reverse
| delta uptime AS Duration
| reverse
| eval Duration=abs(round(Duration/60,4))

| search uptime=0



Like this I was able identify components with uptime=0. 
Now I would like to extend my query to display result when specific component downtime=0 for several consecutive days e.g. 3 or 2 days.
Thank you

Labels (2)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

How (in non-SPL terms) do you determine what the downtime for a component is?

View solution in original post

Stives
Explorer

Hello, I´ve adjusted my query following:

| bin span=3h _time
| stats values(uptime) AS Uptime BY _time, component_hostname


Like this I will get all Uptimes listed in a span of 3hours by component_hostname. See table

_time

component_hostname

uptime

2024-11-11 15:00

router

 

0.00000
1.00000
5.00000


You can see there are results which do include different uptimes e.g. 0..., 1.... or 5....
Now I would like to create an Alert so that it will display only component_hostname which had no different uptime expect of 0 for 1 day.
Thank you

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| bin span=3h _time
| stats values(uptime) AS Uptime BY _time, component_hostname
| where Uptime=0
0 Karma

Stives
Explorer

Hello ITW, thank you for reply.

Where Uptime=0 won´t resolve it because during 1 day span some component_hostnames been uptime for few seconds e.g. 1.0000 or 5.0000. This means it can´t be counted as permanent downtime. 

My query should be looking only for component_hostnames  which had no different Uptime except of 0 in span of 1 day.

Stives
 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Please give a detailed example of what you want showing why where uptime=0 doesn't work for you.

0 Karma

Stives
Explorer

Hello, see table below please. There are results for components A, B and C:

_time

component_hostname

uptime

2024-11-11 15:00

Host A

0.00000
1.00000
5.00000

2024-11-11 15:00

Host B

0.00000
1.00000

2024-11-11 15:00

Host C

0.00000

 

If I apply where uptime=0 my results will look following:

_time

component_hostname

uptime

2024-11-11 15:00

Host A

0.00000

2024-11-11 15:00

Host B

0.00000

2024-11-11 15:00

Host C

0.00000

 

But this is not what I need because component A was also showing uptime during my span 1.00000 and 5.00000. Same applies for component B as it was showing uptime 0.00000 and 1.00000. Which means that components A and B where uptime during my span and that is ok. But I´m interested only for components which during the span where showing no other value then 0 e.g. component C. Like this I know that components A and B are responding during my span but component C not responding because its always 0.  

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Try using the max function instead of values.

| bin span=3h _time
| stats max(uptime) AS Uptime BY _time, component_hostname
| where Uptime=0

 

---
If this reply helps you, Karma would be appreciated.

Stives
Explorer

Thank you for feedback but yet again this will return uptimes regardless length (0,  1 or more).
If I use where Uptime=0 it shows me uptime lengths taking 0 but it does not necessarily mean there are no 1, 2 or any different lengths while span.   

I need my result to return those component_hostnames which had no different length except of 0 nothing else (no 1 or 2 or any different). 
This is how I would know component is UP or DOWN during my span. 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Please share your full search which is not working for you

0 Karma

Stives
Explorer
| bin span=3h _time

| stats max(uptime) AS Uptime BY _time, component_hostname

| where Uptime=0
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

There doesn't appear (from what you have shared) to be anything that you are doing wrong

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I agree.  The combination of stats max(uptime) and where Uptime=0 should show only hosts with zero up time.

Is there something pertinent that is not being shared?

---
If this reply helps you, Karma would be appreciated.
0 Karma

Stives
Explorer

Hello Richgalloway,
thank you for feedback. I´ve managed to set my time window with Uptime results. Now I got issue using my span so that I could see _time and Uptime in seconds in one row only. This I would like to achieve by setting Time picker to last 3 days and I set my span to 72 hours so that Im having one row with all the results.

| bin span=72h _time

My most oldest time should be then always 3 days backwards. 
But when I do this my results display also time which is outside of 3 days (see attachement). My oldest results should have end 18.11.24 in the morning but instead it also shows results for 17.11.24. In this case instead of one row I will have 2 rows which will crash my search idea as I need to have one row with the results only. 
Why is that can you suggest ? How exactly does span function work ?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You could use the advanced time picker and select earliest as "@d-3d" and latest as "@d" The @d aligns to the beginning of the current day, then the -3d goes back a further 3 days (usually 72h but across daylight saving changes, these may be slightly different. The same may go for the span, so try using 3d rather than 72h.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Adding to @ITWhisperer 's question - remember that if you're detecting a downtime as lack of events you are unable to either detect downtime longer than your search window completely (if you're not using a list of values to compare your results to) or at least unable to detect their real length beyond your search window.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

How (in non-SPL terms) do you determine what the downtime for a component is?

Stives
Explorer

Hi ITWhisperer,
downtime represents every value starting with 0,00 do matter how many decimals.
BR

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Value 0,00 of which field(s)?

---
If this reply helps you, Karma would be appreciated.
0 Karma

Stives
Explorer

Hi,
I know it's bit confusing but when I run my query field Uptime has value 0,00 by _time. It does not matter how many decimals after 0. 

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...