Custom Alerting

xxhavok1xx · ‎08-15-2013

I am currently sending all cisco ace load balancer syslogs to my splunk server.

Within Splunk, I have two separate real-time alerts - one alert notifies me via email when a certain server goes down and a separate alert notifies me when the server comes back up.

Is it possible to create a custom alert where I will only be notified if the server does not come back up after being down for more than X amount of hours? Receiving up down alerts is very annoying and sometimes there are so many emails, I wouldn't know if an up alert matches a down alert.

If this is possible, how would I go about implementing it? Thanks

To provide a little more detail, here is exactly what my real-time alerts look like:
Alert 1 - "Particular Server Name" Changed State to DOWN - send email
Alert 2 - "Particular Server Name" Changed State to UP - send email
Where the server name is an arbitrary name of a server that wouldn't mean anything to anybody
even if I did copy it directly from my alert.

Sometimes the patching team fails to bring up a server properly and we find out the hard way when somebody complains. I actually have dozens of alerts just like this but for different servers. However, one solution would apply for all of my alerts.

mloven_splunk · ‎09-24-2013

Try this search:

"Health Probe" "changed state to" | rex "Health\sProbe\s(?<probe_name>[^_]+)_ | rex "changed\sstate\to\s(?<state>[^\$]+)$ | transaction  fields="probe_name,state"  startswith=UP endswith=DOWN keepevicted=t | search duration > 10800

mloven_splunk · ‎09-24-2013

sorry, I forgot my closing quotes on the rex commands (or else Answers ate them). at the end of each rex command, just before the pipes, put a closing quote. You should be adding two quotes: one after +)_ and one after +)$

xxhavok1xx · ‎09-24-2013

Error in 'SearchParser': Missing a search command before '^'.

mloven_splunk · ‎09-24-2013

I assume that there are events that show a down message? And that they're pretty much the same text as the up messages you posted (only with a "DOWN" at the end)?

So given an up message of this:
[Date] [Time] [Server IP] : [Tag]: Health Probe NY_HTTP:80_PROBE detected Server Name in serverfarm NY_Serverfarm_01 changed state to UP

and a down message of this:
[Date] [Time] [Server IP] : [Tag]: Health Probe NY_HTTP:80_PROBE detected Server Name in serverfarm NY_Serverfarm_01 changed state to DOWN

And you want to create a transaction based on an up message followed by a down message for the probe name (i.e. "NY")? Is that correct? If so, you'd want something like this:

"Health Probe" "changed state to" | rex "Health\sProbe\s(?<probe_name>[^_]+)_ | rex "changed\sstate\to\s(?<state>[^\$]+)$ | transaction  fields="probe_name,state" maxspan=180m startswith=UP endswith=DOWN keepevicted=t

Hope that helps!

xxhavok1xx · ‎09-24-2013

Your assumptions are correct, there is an UP message for every DOWN... atleast there should be

Specifically, I want an email sent out if an UP message is not received within 3 hours of seeing a DOWN message. This way admins can take action and bring it back up properly.

xxhavok1xx · ‎09-24-2013

Mike, per our discussion, here is what an actual log in splunk looks like.

[Date] [Time] [Server IP] : [Tag]: Health Probe NY_HTTP:80_PROBE detected Server Name in serverfarm NY_Serverfarm_01 changed state to UP

Another example would be this:

[Date] [Time] [Server IP] : [Tag]: Health Probe NY_HTTP:8080_PROBE detected Server Name in serverfarm NY_Serverfarm_02 changed state to UP

So you can see, we need "NY" and UP or DOWN to be extracted so it can be called out within your transaction field expression. We cant use server farms or server names because there are too many but the beginning of the probe is always the same - NY in this case.

mloven_splunk · ‎09-13-2013

"Particular Server Name" "Changed State to" (DOWN OR UP) | transaction fields="dvc,state" maxspan=180m startswith=Down endswith=Up keepevicted=t

This assumes a few things:

You are correctly extracting the server name as a field called "dvc". Feel free to change that to whatever you want.
The state (up or down) is being extracted as a field called "state". Again, change that if you'd like.
In "X amount of hours", X=3. Adjust accordingly.

If that search works correctly, save it and set up an alert.

Hope that helps.

mloven_splunk · ‎09-13-2013

By the way, if dvc and state aren't being extracted, you can do that within your search.

"Particular Server Name" "Changed State to" (DOWN OR UP) | rex "^([^\s]+)\s" |rex "Changed\sState\to\s([^\s]+)" | transaction fields="dvc,state" maxspan=180m startswith=Down endswith=Up keepevicted=t

xxhavok1xx · ‎08-15-2013

The cisco ace module has probe's configured on the device to check the status of any particular server. That probe information is generated in the syslogs. My alert's are based off of the probes that I see in splunk.

linu1988 · ‎08-15-2013

How do you know the server is down? I meant is there anything you do to know the status?

Custom Alerting

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Join the Conversation

Custom Alerting

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...