Alerting

Server Down and Up Alert

Chirag812
Explorer

Hello,

I have created server down and up alerts separately which triggers when the server is down on the basis of percentile80>5 and up when the percentile80<5.

But I want to create one combine alert which should trigger all the time when the server is down and I just want only one up alert (Recovery alert) once the server is up again, means it should not trigger multiple alerts for up until it again down.

Any way to get this done ?

Below is the query :

Time Range is last 15 minutes and Cron job is */2 * * * * (every 2 minutes)

index=xyz sourcetype=xyz host=*
| eval RespTime=time_taken/1000
| eval RespTime = round(RespTime,2)
| bucket _time span=2m
| stats avg(RespTime) as Average perc80(RespTime) as "Percentile_80" by _time
| eval Server_Status=if(Percentile_80>=5, "Server Down", "Server UP")


So above alert should trigger when the Server is down and it should trigger every 2 minutes until is up. And then alert should trigger only once when the server is Up again and it should not trigger every 2 minutes until the server is down again.

Labels (1)
0 Karma

P_vandereerden
Splunk Employee
Splunk Employee

One possible solution would be to use a lookup (status_lookup) to keep track of the last known state.  This solution adds a host field so it can work for more than one host.

Step 1:
Create a KVStore (or file based) lookup with the fields "host", and "current_status" (Note: the solution below will also add an alert message field, but that 's more of a side effect.)

Step 2: 
Add the "host" group by clause, and lookup commands to your SPL:

index=xyz sourcetype=xyz host=*
| eval RespTime=time_taken/1000
| eval RespTime = round(RespTime,2)
| bucket _time span=2m
| stats avg(RespTime) as Average perc80(RespTime) as "Percentile_80" by _time host
| eval Current_Server_Status=if(Percentile_80>=5, "Server Down", "Server Up")  
| lookup status_lookup host
| eval alert=case(Current_Server_Status="Server Down",$host$+" is down",
                 (Current_Server_Status="Server Up" AND Server_Status="Server Down"),$host$+" is back up") 
| rename Current_Server_Status AS Server_Status 
| table host Server_Status alert 
| outputlookup status_lookup


You'll end up with a serach that outputs something like this (and updates the lookup for the next alert run):

+---------------+--------------+------+
| Server_Status	| alert	       | host |
+---------------+--------------+------+
| Server Down	| a is down    | a    |
| Server Up     | b is back up | b    |
| Server Up     |              | c    |
| Server Down   | d is down    | d    |
+---------------+--------------+------+

Note that host c has no alert message because it went from "up" to "up" with the sample data I used.

Paul van der Eerden,
Breaking software for over 20 years.
0 Karma
Get Updates on the Splunk Community!

New Case Study: How LSU’s Student-Powered SOCs and Splunk Are Shaping the Future of ...

Louisiana State University (LSU) is shaping the next generation of cybersecurity professionals through its ...

Splunk and Fraud

Join us on November 13 at 11 am PT / 2 pm ET!Join us for an insightful webinar where we delve into the ...

Build Your First SPL2 App!

Watch the recording now!.Do you want to SPL™, too? SPL2, Splunk's next-generation data search and preparation ...