Hi, I need to create some monitoring and alerts based on high response time of my landing page. The thing is there are always some blips so I want to rule that out and only trigger notifications when there is a consistently high response time for a period of time say 20 mins or 30 mins.
How can a write a query like that? I have written a very generic query which gives me the average and 90th percentile response time of every 5 mins like below but I want to trigger the alert only when there is consistently high response times.
Let me know if anyone has any suggestions.
index=myapp_prod sourcetype=ssl_access_combined requested_content="/myapp/products*"
| eval responseTime= responseTime/1000000
| timechart span=5m avg(responseTime) as AverageResponseTime p90(responseTime) as 90thPercentile
As an example - let's say I want to run the alert every 30 mins and check the condition if there are consistently high response times in last 30 mins or 1 hour, then trigger the alert to send out notifications.
Any help is appreciated.
Best Regards,
Sha
Hi @shashank_24,
viewing your search I can suppose that you have responseTime expressed in microseconds, is it correct?
Anyway, I don't understand if your alert is calculated on an average value or on a peak value.
Anyway, if on an average value, you could run something like this:
index=myapp_prod sourcetype=ssl_access_combined earliest=-30m@m latest=@m requested_content="/myapp/products*"
| eval responseTime= responseTime/1000000
| stats avg(responsetime) AS avg_responseTime
| where avg_responseTime>60*30
If instead you want a peak value:
index=myapp_prod sourcetype=ssl_access_combined earliest=-30m@m latest=@m requested_content="/myapp/products*"
| eval responseTime= responseTime/1000000
| where avg_responseTime>60*30
In both cases the trigger condition is that there are results.
if you want 60 minutes instead 30, in the last row replace tha last number.
Ciao.
Giuseppe
Hi @gcusello Thanks for your response. The query which you shared is the one which I also have. So it looks for an average response time over 30 mins and then send the alert BUT what I am interested in is look for every 5 mins over a period of last 30 mins and then see if there is a consistent high response time for every 5 mins, then trigger it. This way we will rule out the outliers or small blips.
Because what happens is let's say if we have a small blip over 5 mins and response time comes back to normal after than, in that case I don't want to get notified because the issue got resolved by itself but if there is a consistency then there may be an actual issue and our Ops team needs to be notified.
Hope I am making sense. And yes the time is in micro seconds.
Hi @shashank_24,
you approach is correct:
you could create two alerts:
obviously the threshold must be different otherwise the second one isn't useful.
Ciao.
Giuseppe