Alerting

Down Server Interface Alert

Timmac
New Member

Hey guys,
Trying to set up an alert that will send an email when an interface goes down but does not come up within a certain timeframe. I'm assuming 10-15 minutes should suffice. We're having an issue when running updates or rebooting a server, the interface does not come up properly sometimes.

This was a test run of what the logs would look like searching for USPK10OLLBS01 and /Common/tcp:

I'm pretty bad with the searching logic so I could really use some help! Thanks much, these are the logs i'm working with below during a test down state on one of the interfaces. It reports a couple ups, but only one down.

2/3/15
10:29:00.000 AM
Feb 3 10:29:00 10.10.0.19 Feb 3 10:29:06 uspk10ollbs01 notice mcpd[6642]: 01070727:5: Pool /Common/UAT-BTS-Batch member /Common/USPK10OLBTSBA02:80 monitor status up. [ /Common/tcp: up ] [ was node down for 0hr:0min:3sec ]
host = 10.10.0.19 source = udp:514 sourcetype = syslog
2/3/15
10:28:57.000 AM
Feb 3 10:28:57 10.10.0.19 Feb 3 10:29:03 uspk10ollbs01 notice mcpd[6642]: 01070638:5: Pool /Common/UAT-BTS-Batch member /Common/USPK10OLBTSBA02:80 monitor status node down. [ /Common/tcp: up ] [ was down for 0hr:0min:16sec ]
host = 10.10.0.19 source = udp:514 sourcetype = syslog
2/3/15
10:28:41.000 AM
Feb 3 10:28:41 10.10.0.19 Feb 3 10:28:47 uspk10ollbs01 notice mcpd[6642]: 01070638:5: Pool /Common/UAT-BTS-Batch member /Common/USPK10OLBTSBA02:80 monitor status down. [ /Common/tcp: down ] [ was up for 856hrs:4mins:2sec ]

Tags (2)
0 Karma

somesoni2
Revered Legend

Try something like this (assuming host name is NOT extracted. Remove the first regex for host if its extracted)

your base search  | rex "(?<HostName>\w+)\snotice.*was down for (?<hour>\d+)hrs\:(?<minute>\d+)mins\:(?<second>\d+)sec\s*\]" | eval Downtime=round((hour*3600 + minute*60 + second)/60,2)  | where Downtime>15

You can schedule this search and setup alert.
http://docs.splunk.com/Documentation/Splunk/6.2.1/Alert/Setupalertactions#Configure_email_notificati...

0 Karma
Get Updates on the Splunk Community!

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Shape the Future of Splunk: Join the Product Research Lab!

Join the Splunk Product Research Lab and connect with us in the Slack channel #product-research-lab to get ...