Hi,
For our cloud-hosted API monitoring, we've implemented Error and Performance (response time) based HRs for each of our APIs and mobile app Network Requests. The reason for the granular level of monitoring is so we can tie the HRs to health status indicators on our dashboards and get a granular view of exactly which APIs are experiencing issues in a single glance in the one dashboard view.
For our performance-based health rules, we have two alerting criteria - response time vs an AI established baseline (set to alert over a set number of standard deviations) as well as a static threshold (a "must not exceed" response time threshold) which is used to monitor slow performance degradation over a long period of time and in case there are response time spikes that the baseline features just think are normal.
Is this a recommended approach or does the appd community/appd team think that only baseline-based thresholds are recommended for BT/Network Request perf monitoring?
My concern is using static thresholds requires more maintenance over time and will be an operational burden.
^ Edited by @Ryan.Paredez for a more searchable title
One argument for the inclusion of statics thresholds are business and industry response time SLAs. These are a static value.
Hi @Jason.Gill,
Thanks for asking your question on the Community. Let's see if our Community All-star @Mario.Morelli can offer some best practices insight.