Hi, I want to customize my alert based on the number of events. For example, I have the query below which alerts when the failure rate is greater than 25% in 10 minutes, but the alert is too noisy when my number of calls is low.
My calls duration (for a duration of 10 minutes) throughout the day ranges from 5 to 4000.
I want to bin my calls with different alert threshold value, like if my total calls are between (5-10)--->60%, (10-20)-->50%, (20-30)-->35% and so on, rather than keeping a static threshold. Kindly please suggest. Thanks
index=abc sourcetype=abc:logs service_name="abc"|eval failure=if(response_time> 3,1,0)|timechart span=10m sum(success) as "failed_calls",count as "total_calls"|eval failure%=(failed_calls/total_calls)*100|where failure% >25
Try like this (I believe there was typo in your original query where you're doing sum(success) instead of sum(failure), check)
index=abc sourcetype=abc:logs service_name="abc"|eval failure=if(response_time> 3,1,0)
|timechart span=10m sum(failure) as "failed_calls",count as "total_calls"|eval "failure%"=(failed_calls/total_calls)*100
| eval threshold=case(total_calls<10,60, total_calls<20,50, total_calls<30,35, .....add other conditions per your need...., true(),25)
|where 'failure%' >threshold
Try like this (I believe there was typo in your original query where you're doing sum(success) instead of sum(failure), check)
index=abc sourcetype=abc:logs service_name="abc"|eval failure=if(response_time> 3,1,0)
|timechart span=10m sum(failure) as "failed_calls",count as "total_calls"|eval "failure%"=(failed_calls/total_calls)*100
| eval threshold=case(total_calls<10,60, total_calls<20,50, total_calls<30,35, .....add other conditions per your need...., true(),25)
|where 'failure%' >threshold