Hi, I'm looking for a way to measure the uptime of a service we run. The tricky part for me is that we have downtime for various maintenances which is expected downtime and should not count against us. We only want to track unexpected downtimes. The other problem is that the maintenance schdeule isn't super consistent so I can't just say to ignore downtimes every Monday, for example. Anyone have a clever way to handle this? Ideally, something that is at least somewhat automated.
Create a form in Splunk for marking scheduled downtime that updates a KVStore. Then use this to drop out events during marked time periods. The hard part is getting people to use it.