Alerting

How to create consistent monitoring and alerts based on high response time of my landing page?

shashank_24
Path Finder

Hi, I need to create some monitoring and alerts based on high response time of my landing page. The thing is there are always some blips so I want to rule that out and only trigger notifications when there is a consistently high response time for a period of time say 20 mins or 30 mins.

How can a write a query like that? I have written a very generic query which gives me the average and 90th percentile response time of every 5 mins like below but I want to trigger the alert only when there is consistently high response times.

Let me know if anyone has any suggestions.

 

index=myapp_prod sourcetype=ssl_access_combined requested_content="/myapp/products*"
| eval responseTime= responseTime/1000000 
| timechart span=5m avg(responseTime) as AverageResponseTime p90(responseTime) as 90thPercentile

 

As an example - let's say I want to run the alert every 30 mins and check the condition if there are consistently high response times in last 30 mins or 1 hour, then trigger the alert to send out notifications.

Any help is appreciated.

Best Regards,
Sha

Labels (3)
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @shashank_24,

viewing your search I can suppose that you have responseTime expressed in microseconds, is it correct?

Anyway, I don't understand if your alert is calculated on an average value or on a peak value.

Anyway, if on an average value, you could run something like this:

index=myapp_prod sourcetype=ssl_access_combined earliest=-30m@m latest=@m requested_content="/myapp/products*"
| eval responseTime= responseTime/1000000 
| stats avg(responsetime) AS avg_responseTime
| where avg_responseTime>60*30

If instead you want a peak value:

index=myapp_prod sourcetype=ssl_access_combined earliest=-30m@m latest=@m requested_content="/myapp/products*"
| eval responseTime= responseTime/1000000 
| where avg_responseTime>60*30

In both cases the trigger condition is that there are results.

if you want 60 minutes instead 30, in the last row replace tha last number.

Ciao.

Giuseppe

0 Karma

shashank_24
Path Finder

Hi @gcusello Thanks for your response. The query which you shared is the one which I also have. So it looks for an average response time over 30 mins and then send the alert BUT what I am interested in is look for every 5 mins over a period of last 30 mins and then see if there is a consistent high response time for every 5 mins, then trigger it. This way we will rule out the outliers or small blips. 

Because what happens is let's say if we have a small blip over 5 mins and response time comes back to normal after than, in that case I don't want to get notified because the issue got resolved by itself but if there is a consistency then there may be an actual issue and our Ops team needs to be notified.

Hope I am making sense. And yes the time is in micro seconds.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @shashank_24,

you approach is correct:

you could create two alerts:

  • one every 5 minutes,
  • one every 30 minutes,

obviously the threshold must be different otherwise the second one isn't useful.

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...