All Apps and Add-ons

Custom Alerting

xxhavok1xx
Explorer

I am currently sending all cisco ace load balancer syslogs to my splunk server.

Within Splunk, I have two separate real-time alerts - one alert notifies me via email when a certain server goes down and a separate alert notifies me when the server comes back up.

Is it possible to create a custom alert where I will only be notified if the server does not come back up after being down for more than X amount of hours? Receiving up down alerts is very annoying and sometimes there are so many emails, I wouldn't know if an up alert matches a down alert.

If this is possible, how would I go about implementing it? Thanks

To provide a little more detail, here is exactly what my real-time alerts look like:
Alert 1 - "Particular Server Name" Changed State to DOWN - send email
Alert 2 - "Particular Server Name" Changed State to UP - send email
Where the server name is an arbitrary name of a server that wouldn't mean anything to anybody
even if I did copy it directly from my alert.

Sometimes the patching team fails to bring up a server properly and we find out the hard way when somebody complains. I actually have dozens of alerts just like this but for different servers. However, one solution would apply for all of my alerts.

mloven_splunk
Splunk Employee
Splunk Employee

Try this search:

"Health Probe" "changed state to" | rex "Health\sProbe\s(?<probe_name>[^_]+)_ | rex "changed\sstate\to\s(?<state>[^\$]+)$ | transaction  fields="probe_name,state"  startswith=UP endswith=DOWN keepevicted=t | search duration > 10800
0 Karma

mloven_splunk
Splunk Employee
Splunk Employee

sorry, I forgot my closing quotes on the rex commands (or else Answers ate them). at the end of each rex command, just before the pipes, put a closing quote. You should be adding two quotes: one after +)_ and one after +)$

0 Karma

xxhavok1xx
Explorer

Error in 'SearchParser': Missing a search command before '^'.

0 Karma

mloven_splunk
Splunk Employee
Splunk Employee

I assume that there are events that show a down message? And that they're pretty much the same text as the up messages you posted (only with a "DOWN" at the end)?

So given an up message of this:
[Date] [Time] [Server IP] : [Tag]: Health Probe NY_HTTP:80_PROBE detected Server Name in serverfarm NY_Serverfarm_01 changed state to UP

and a down message of this:
[Date] [Time] [Server IP] : [Tag]: Health Probe NY_HTTP:80_PROBE detected Server Name in serverfarm NY_Serverfarm_01 changed state to DOWN

And you want to create a transaction based on an up message followed by a down message for the probe name (i.e. "NY")? Is that correct? If so, you'd want something like this:

"Health Probe" "changed state to" | rex "Health\sProbe\s(?<probe_name>[^_]+)_ | rex "changed\sstate\to\s(?<state>[^\$]+)$ | transaction  fields="probe_name,state" maxspan=180m startswith=UP endswith=DOWN keepevicted=t

Hope that helps!

0 Karma

xxhavok1xx
Explorer

Your assumptions are correct, there is an UP message for every DOWN... atleast there should be

Specifically, I want an email sent out if an UP message is not received within 3 hours of seeing a DOWN message. This way admins can take action and bring it back up properly.

0 Karma

xxhavok1xx
Explorer

Mike, per our discussion, here is what an actual log in splunk looks like.

[Date] [Time] [Server IP] : [Tag]: Health Probe NY_HTTP:80_PROBE detected Server Name in serverfarm NY_Serverfarm_01 changed state to UP

Another example would be this:

[Date] [Time] [Server IP] : [Tag]: Health Probe NY_HTTP:8080_PROBE detected Server Name in serverfarm NY_Serverfarm_02 changed state to UP

So you can see, we need "NY" and UP or DOWN to be extracted so it can be called out within your transaction field expression. We cant use server farms or server names because there are too many but the beginning of the probe is always the same - NY in this case.

0 Karma

mloven_splunk
Splunk Employee
Splunk Employee

"Particular Server Name" "Changed State to" (DOWN OR UP) | transaction fields="dvc,state" maxspan=180m startswith=Down endswith=Up keepevicted=t

This assumes a few things:

  1. You are correctly extracting the server name as a field called "dvc". Feel free to change that to whatever you want.
  2. The state (up or down) is being extracted as a field called "state". Again, change that if you'd like.
  3. In "X amount of hours", X=3. Adjust accordingly.

If that search works correctly, save it and set up an alert.

Hope that helps.

mloven_splunk
Splunk Employee
Splunk Employee

By the way, if dvc and state aren't being extracted, you can do that within your search.

"Particular Server Name" "Changed State to" (DOWN OR UP) | rex "^([^\s]+)\s" |rex "Changed\sState\to\s([^\s]+)" | transaction fields="dvc,state" maxspan=180m startswith=Down endswith=Up keepevicted=t

0 Karma

xxhavok1xx
Explorer

The cisco ace module has probe's configured on the device to check the status of any particular server. That probe information is generated in the syslogs. My alert's are based off of the probes that I see in splunk.

0 Karma

linu1988
Champion

How do you know the server is down? I meant is there anything you do to know the status?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...