Splunk Enterprise

Search query for Downtime taking several days

Stives
Explorer

Dear Splunkers,

I would like to ask your support in order to adapt my search query to return results if downtime taking specific time window e.g. 3 consecutive days.
May search query is following:

| table _time, status, component_hostname, uptime
| sort by _time asc
| streamstats last(status) AS status by component_hostname
| sort by _time asc
| reverse
| delta uptime AS Duration
| reverse
| eval Duration=abs(round(Duration/60,4))

| search uptime=0



Like this I was able identify components with uptime=0. 
Now I would like to extend my query to display result when specific component downtime=0 for several consecutive days e.g. 3 or 2 days.
Thank you

Labels (2)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

How (in non-SPL terms) do you determine what the downtime for a component is?

View solution in original post

Stives
Explorer

Hello, I´ve adjusted my query following:

| bin span=3h _time
| stats values(uptime) AS Uptime BY _time, component_hostname


Like this I will get all Uptimes listed in a span of 3hours by component_hostname. See table

_time

component_hostname

uptime

2024-11-11 15:00

router

 

0.00000
1.00000
5.00000


You can see there are results which do include different uptimes e.g. 0..., 1.... or 5....
Now I would like to create an Alert so that it will display only component_hostname which had no different uptime expect of 0 for 1 day.
Thank you

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| bin span=3h _time
| stats values(uptime) AS Uptime BY _time, component_hostname
| where Uptime=0
0 Karma

Stives
Explorer

Hello ITW, thank you for reply.

Where Uptime=0 won´t resolve it because during 1 day span some component_hostnames been uptime for few seconds e.g. 1.0000 or 5.0000. This means it can´t be counted as permanent downtime. 

My query should be looking only for component_hostnames  which had no different Uptime except of 0 in span of 1 day.

Stives
 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Please give a detailed example of what you want showing why where uptime=0 doesn't work for you.

0 Karma

Stives
Explorer

Hello, see table below please. There are results for components A, B and C:

_time

component_hostname

uptime

2024-11-11 15:00

Host A

0.00000
1.00000
5.00000

2024-11-11 15:00

Host B

0.00000
1.00000

2024-11-11 15:00

Host C

0.00000

 

If I apply where uptime=0 my results will look following:

_time

component_hostname

uptime

2024-11-11 15:00

Host A

0.00000

2024-11-11 15:00

Host B

0.00000

2024-11-11 15:00

Host C

0.00000

 

But this is not what I need because component A was also showing uptime during my span 1.00000 and 5.00000. Same applies for component B as it was showing uptime 0.00000 and 1.00000. Which means that components A and B where uptime during my span and that is ok. But I´m interested only for components which during the span where showing no other value then 0 e.g. component C. Like this I know that components A and B are responding during my span but component C not responding because its always 0.  

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Try using the max function instead of values.

| bin span=3h _time
| stats max(uptime) AS Uptime BY _time, component_hostname
| where Uptime=0

 

---
If this reply helps you, Karma would be appreciated.

Stives
Explorer

Thank you for feedback but yet again this will return uptimes regardless length (0,  1 or more).
If I use where Uptime=0 it shows me uptime lengths taking 0 but it does not necessarily mean there are no 1, 2 or any different lengths while span.   

I need my result to return those component_hostnames which had no different length except of 0 nothing else (no 1 or 2 or any different). 
This is how I would know component is UP or DOWN during my span. 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Please share your full search which is not working for you

0 Karma

Stives
Explorer
| bin span=3h _time

| stats max(uptime) AS Uptime BY _time, component_hostname

| where Uptime=0
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

There doesn't appear (from what you have shared) to be anything that you are doing wrong

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I agree.  The combination of stats max(uptime) and where Uptime=0 should show only hosts with zero up time.

Is there something pertinent that is not being shared?

---
If this reply helps you, Karma would be appreciated.
0 Karma

Stives
Explorer

Hello Richgalloway,
thank you for feedback. I´ve managed to set my time window with Uptime results. Now I got issue using my span so that I could see _time and Uptime in seconds in one row only. This I would like to achieve by setting Time picker to last 3 days and I set my span to 72 hours so that Im having one row with all the results.

| bin span=72h _time

My most oldest time should be then always 3 days backwards. 
But when I do this my results display also time which is outside of 3 days (see attachement). My oldest results should have end 18.11.24 in the morning but instead it also shows results for 17.11.24. In this case instead of one row I will have 2 rows which will crash my search idea as I need to have one row with the results only. 
Why is that can you suggest ? How exactly does span function work ?

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

You could use the advanced time picker and select earliest as "@d-3d" and latest as "@d" The @d aligns to the beginning of the current day, then the -3d goes back a further 3 days (usually 72h but across daylight saving changes, these may be slightly different. The same may go for the span, so try using 3d rather than 72h.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Adding to @ITWhisperer 's question - remember that if you're detecting a downtime as lack of events you are unable to either detect downtime longer than your search window completely (if you're not using a list of values to compare your results to) or at least unable to detect their real length beyond your search window.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

How (in non-SPL terms) do you determine what the downtime for a component is?

Stives
Explorer

Hi ITWhisperer,
downtime represents every value starting with 0,00 do matter how many decimals.
BR

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Value 0,00 of which field(s)?

---
If this reply helps you, Karma would be appreciated.
0 Karma

Stives
Explorer

Hi,
I know it's bit confusing but when I run my query field Uptime has value 0,00 by _time. It does not matter how many decimals after 0. 

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...