We are creating a solution to monitor servers that are behind a network load balancer (NLB). The NLB sends health probes to the servers every five seconds. When a server fails to respond to the probe, the NLB generates a syslog message. It's fairly straightforward to configure an alert in Splunk that will send an email to the server team when these "health probe failed" messages are received; however, we need to account for routine server maintenance....a way to put things in "maintenance mode" so to speak so that alerts aren't generated.
I'm open to ideas on this. Anyone done this sort of thing before?
We could create a lookup table file that gets populated with servers that are down for maintenance, but we would need an easy way to modify this. Ideally, a user-friendly way for the server operators to do this themselves. I'm thinking of something like a web form where they can enter a server name or address, click submit, and have it dynamically added to the lookup table file. Of course, they would need to be able to remove it from the file as well when their maintenance is over to re-enable alerting for the server.