I m exploring to increase the efficiency of my systems' alerts. Was reading up on the Google SRE implementation of burn rate based alert. It is supposed to reduce the overall downtime by not scheduling (if by time-based, it will increase the error budget due to the wait time before alerts are fired) alert based on time but with a predetermined rate.
I must declare that after reading the pages a few times, I still cannot grasp the idea of the burn rate. Maybe someone here can enlighten me.
I m interested to see how this burn rate based alert can be implemented in splunk to increase the detection time.