Splunk Search

Create an alert if error rate is over 10 ( or X number ) for 15min continusly.

dpdwibedy
Explorer

Hi There,

I have this query that reports  the status code error rates.

index=apache_core  userAgent!="nginx/*" source="*access.log*"  requestURI!="/web/app*" NOT (requestURI="/api/xyz/*"  OR requestURI="/api/yyy/*"  AND statusCode=404) earliest=-30m latest=now 
| stats count(eval(statusCode>=400)) as errors, count as total
| eval ErrorRate = errors * 100 / total | fields ErrorRate

 

This works , but  looks like it is taking an average of error rate  in last 30 mins and reports  over  the threshold  if there is a onetime spike...

What I want  is  to   Alert , only if the Error Rate  is  higher than threshold for  continuous  10mins   in last 30mins OR 1 hr.

 How can I achieve that ?

 

Thanks,

DD

0 Karma

thambisetty
SplunkTrust
SplunkTrust

index=apache_core  userAgent!="nginx/*" source="*access.log*"  requestURI!="/web/app*" NOT (requestURI="/api/xyz/*"  OR requestURI="/api/yyy/*"  AND statusCode=404) earliest=-30m latest=now

| bin _time span=10m

| stats count(eval(statusCode>=400)) as errors, count as total by _time

| eval ErrorRate = errors * 100 / total | fields ErrorRate _time

| where ErrorRate > yourthreshhold

————————————
If this helps, give a like below.
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

have you tried streamstats with time_window?

r. Ismo

0 Karma

dpdwibedy
Explorer

@isoutamo : Nope.

I am not good in Splunk queries . Could you give me an example ?

 

Thanks,

DD

0 Karma

dpdwibedy
Explorer

Hi All,

I am still facing issue here . Can some one give me some examples ?

Thanks,

DD

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

It should be something like this. I haven't your data so I cannot test exactly with it.

index=apache_core  userAgent!="nginx/*" source="*access.log*"  requestURI!="/web/app*" NOT (requestURI="/api/xyz/*"  OR requestURI="/api/yyy/*"  AND statusCode=404) earliest=-30m latest=now 
| streamstats time_span=10m count(eval(statusCode>=400)) as errors, count as total
| eval ErrorRate = errors * 100 / total | fields _time,ErrorRate
| where ErrorRate > Threshold

r. Ismo 

0 Karma

impurush
Contributor

@dpdwibedy 
This will not solve the exact solution which you are looking but we are handling the same kind of situation in our environment as below

index=apache_core  userAgent!="nginx/*" source="*access.log*"  requestURI!="/web/app*" NOT (requestURI="/api/xyz/*"  OR requestURI="/api/yyy/*"  AND statusCode=404) earliest=-30m latest=now 
| timechart span=5m count(eval(statusCode>=400)) as errors, count as total
| eval ErrorRate = errors * 100 / total | fields _time,ErrorRate
| where ErrorRate > Threshold

And if you want to send an alert if it is greater than 15 mins, then you need to configure the alert as trigger the alert if the result has more than 2 rows. Then the alert will trigger if it 3 times crosses the threshold in the 30 mins window.

Explanation: The above query split the 30 minutes window into 5 mins window and we are checking if 3 (5 mins) window is greater than the threshold, the alert will trigger. So it solves our problem.

0 Karma
Get Updates on the Splunk Community!

Your Guide to Splunk Digital Experience Monitoring

A flawless digital experience isn't just an advantage, it's key to customer loyalty and business success. But ...

Data Management Digest – November 2025

  Welcome to the inaugural edition of Data Management Digest! As your trusted partner in data innovation, the ...

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

Join us on Wed, Dec 10. at 10AM PST / 1PM EST for a live webinar and demo with Splunk experts! Discover how ...