Hi There,
I have this query that reports the status code error rates.
index=apache_core userAgent!="nginx/*" source="*access.log*" requestURI!="/web/app*" NOT (requestURI="/api/xyz/*" OR requestURI="/api/yyy/*" AND statusCode=404) earliest=-30m latest=now
| stats count(eval(statusCode>=400)) as errors, count as total
| eval ErrorRate = errors * 100 / total | fields ErrorRate
This works , but looks like it is taking an average of error rate in last 30 mins and reports over the threshold if there is a onetime spike...
What I want is to Alert , only if the Error Rate is higher than threshold for continuous 10mins in last 30mins OR 1 hr.
How can I achieve that ?
Thanks,
DD
index=apache_core userAgent!="nginx/*" source="*access.log*" requestURI!="/web/app*" NOT (requestURI="/api/xyz/*" OR requestURI="/api/yyy/*" AND statusCode=404) earliest=-30m latest=now
| bin _time span=10m
| stats count(eval(statusCode>=400)) as errors, count as total by _time
| eval ErrorRate = errors * 100 / total | fields ErrorRate _time
| where ErrorRate > yourthreshhold
Hi
have you tried streamstats with time_window?
r. Ismo
Hi All,
I am still facing issue here . Can some one give me some examples ?
Thanks,
DD
Hi
It should be something like this. I haven't your data so I cannot test exactly with it.
index=apache_core userAgent!="nginx/*" source="*access.log*" requestURI!="/web/app*" NOT (requestURI="/api/xyz/*" OR requestURI="/api/yyy/*" AND statusCode=404) earliest=-30m latest=now
| streamstats time_span=10m count(eval(statusCode>=400)) as errors, count as total
| eval ErrorRate = errors * 100 / total | fields _time,ErrorRate
| where ErrorRate > Thresholdr. Ismo
@dpdwibedy
This will not solve the exact solution which you are looking but we are handling the same kind of situation in our environment as below
index=apache_core userAgent!="nginx/*" source="*access.log*" requestURI!="/web/app*" NOT (requestURI="/api/xyz/*" OR requestURI="/api/yyy/*" AND statusCode=404) earliest=-30m latest=now
| timechart span=5m count(eval(statusCode>=400)) as errors, count as total
| eval ErrorRate = errors * 100 / total | fields _time,ErrorRate
| where ErrorRate > ThresholdAnd if you want to send an alert if it is greater than 15 mins, then you need to configure the alert as trigger the alert if the result has more than 2 rows. Then the alert will trigger if it 3 times crosses the threshold in the 30 mins window.
Explanation: The above query split the 30 minutes window into 5 mins window and we are checking if 3 (5 mins) window is greater than the threshold, the alert will trigger. So it solves our problem.