I have a scheduled alert looking at a certain specific event type, which is set to trigger if the 90th percentile time taken goes over a certain amount. As this can happen fairly often, I have set the throttle so that if the alert triggers, it does not do it again for three hours. However, the alert simply ignores the throttle and triggers whenever it wants.
The alert has the following conditions:
<base search> | eval time_taken_s=time_taken/1000 |stats perc90(time_taken_s) as response_percentile, count as Count
Condition: If custom condition if met
Custom Condition Search: search response_percentile > 20 AND Count > 30
Alert mode: Once per search
After triggering the alert, don't trigger it again for: 3 Hour(s)
If anyone can see what I am doing wrong that would be greatly appreciated.
1) You are only triggering once when the time taken goes beyond a threshold ten percent of the time in the prior hour. And, if it does that, you don't want to know again for three hours. This is not a valid real time use case. Change this to indexed real time or near-real time (as in, run once 1m or per 5m), and you will avoid a lot of stress on the machines, and a lot of headaches in how things work.
2) Once you make the changes, you can add your custom condition to the SPL as a final where clause, then trigger on count >0
3) This could be rewritten to be more straightforward. You are checking that at least ten percent of your time_taken values are above 20 and that there are more than 30 events in the hour. So, the following is mathematically equivalent.
| eval is_slow=if(time_taken>20000,100.00,0.00)
| stats count, avg(is_slow) as pct_slow
| where pct_slow> 10.00 AND count>30