Alerting

Server Down and Up Alert

Chirag812
Loves-to-Learn

Hello,

I have created server down and up alerts separately which triggers when the server is down on the basis of percentile80>5 and up when the percentile80<5.

But I want to create one combine alert which should trigger all the time when the server is down and I just want only one up alert (Recovery alert) once the server is up again, means it should not trigger multiple alerts for up until it again down.

Any way to get this done ?

Below is the query :

Time Range is last 15 minutes and Cron job is */2 * * * * (every 2 minutes)

index=xyz sourcetype=xyz host=*
| eval RespTime=time_taken/1000
| eval RespTime = round(RespTime,2)
| bucket _time span=2m
| stats avg(RespTime) as Average perc80(RespTime) as "Percentile_80" by _time
| eval Server_Status=if(Percentile_80>=5, "Server Down", "Server UP")


So above alert should trigger when the Server is down and it should trigger every 2 minutes until is up. And then alert should trigger only once when the server is Up again and it should not trigger every 2 minutes until the server is down again.

Labels (1)
0 Karma

P_vandereerden
Splunk Employee
Splunk Employee

One possible solution would be to use a lookup (status_lookup) to keep track of the last known state.  This solution adds a host field so it can work for more than one host.

Step 1:
Create a KVStore (or file based) lookup with the fields "host", and "current_status" (Note: the solution below will also add an alert message field, but that 's more of a side effect.)

Step 2: 
Add the "host" group by clause, and lookup commands to your SPL:

index=xyz sourcetype=xyz host=*
| eval RespTime=time_taken/1000
| eval RespTime = round(RespTime,2)
| bucket _time span=2m
| stats avg(RespTime) as Average perc80(RespTime) as "Percentile_80" by _time host
| eval Current_Server_Status=if(Percentile_80>=5, "Server Down", "Server Up")  
| lookup status_lookup host
| eval alert=case(Current_Server_Status="Server Down",$host$+" is down",
                 (Current_Server_Status="Server Up" AND Server_Status="Server Down"),$host$+" is back up") 
| rename Current_Server_Status AS Server_Status 
| table host Server_Status alert 
| outputlookup status_lookup


You'll end up with a serach that outputs something like this (and updates the lookup for the next alert run):

+---------------+--------------+------+
| Server_Status	| alert	       | host |
+---------------+--------------+------+
| Server Down	| a is down    | a    |
| Server Up     | b is back up | b    |
| Server Up     |              | c    |
| Server Down   | d is down    | d    |
+---------------+--------------+------+

Note that host c has no alert message because it went from "up" to "up" with the sample data I used.

Paul van der Eerden,
Breaking software for over 20 years.
0 Karma
Get Updates on the Splunk Community!

2024 Splunk Career Impact Survey | Earn a $20 gift card for participating!

Hear ye, hear ye! The time has come again for Splunk's annual Career Impact Survey!  We need your help by ...

Optimize Cloud Monitoring

  TECH TALKS Optimize Cloud Monitoring Tuesday, August 13, 2024  |  11:00AM–12:00PM PST   Register to ...

What's New in Splunk Cloud Platform 9.2.2403?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2403! Analysts can ...