Alerting

Conditional Alert based on comparison of results of 30mins window

poddraj
Explorer

Hi,
My scenario is that I have Counts of Total Requests, Success, Failure & Failure% for time span of every 30 mins over last 2 hours
Let's say first 30mins I got 100 hits and failure% is more than 60% then I want to send an alert immediately but let's say if first 30mins failure% is between 30-50% then I want to see the failure% of previous 30mins and if the failure% of this 30mins is also b/w 30-50% then I want to see one more previous 30mins failure% and if that interval also has same failure% then I want to trigger alert but if the 2nd or 3rd 30min interval has less then 30% failure then I do not want to send alert

I want this alert to be running for every 15 mins. How can I do this in splunk?
I have written below query to get the events for last 2 hours but could not move ahead on the next steps.

index=dte_fios sourcetype=dte2_Fios FT=*FT | eval Interval=strftime('_time',"%d-%m-%Y %H:%M:%S")
| eval Status=case(Error_Code=="0000","Success",1=1,"Failure")
| timechart span=30m count by Status
| eval Total = Success + Failure
| eval Failure%=round(Failure/Total*100)
| table _time,Total,Success,Failure,Failure%

Output is below:
_time Total Success Failure Failure%
2020-04-20 05:00:00 75 61 14 19
2020-04-20 05:30:00 207 129 78 38
2020-04-20 06:00:00 25 10 15 60

Labels (1)
0 Karma
1 Solution

to4kawa
Ultra Champion
| makeresults
| eval _raw="time,Total,Success,Failure,Failure_perc
2020-04-20 05:00:00,75,61,14,19
2020-04-20 05:30:00,207,129,78,38
2020-04-20 06:00:00,25,10,15,60"
| multikv forceheader=1
| eval _time=strptime(time,"%F %T")
| table _time,Total,Success,Failure,Failure_perc
| rename COMMENT as "this is your result. from here, the logic"
| autoregress Failure_perc p=2 as F_anHourAgo 
| autoregress Failure_perc p=1 as F_30MinsAgo
| eval alert=case(Failure_perc > 60, 1
, (30 <= Failure_perc AND Failure_perc <= 50) AND (30 <= F_30MinsAgo AND F_30MinsAgo <= 50) AND (30 <= F_anHourAgo AND F_anHourAgo <= 50), 1
, true(), 0)

timechart is ascending order by default.
and autoregress can keep old value.

...
| where alert = 1

For alerting.
By the way, between 50% and 60%, do you fire alert?

View solution in original post

Get Updates on the Splunk Community!

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...