Splunk Search

Splunk alert for peak errors only

shashank_24
Path Finder

Hi,  I am working on a query to write an alert where i need to monitor few pages for 500 Errors. Now currently there are some investigations going on to fix those errors so we usually get few every 30 minutes.

So basically at the moment not every 5xx is a problem and I know it's hard for us to see, which of those are "real problems" and which not.

What I want to do is something like adjust our thresholds in the Splunk search so that it only recognise peaks, which are surely something "going wrong" and then trigger the alert.

I have a alert like this at the moment which is triggered every 30 minutes.

index=myindex sourcetype=ssl_access_combined requested_content="/myapp*" NOT images status=50* 
| stats count by status
| where (status=500 AND count > 10)

 

Right now it gets triggered when the count of 500 error is 10 in last 30 minutes but I don't want that as it is unnecessary flooding the mailbox. 

Is there a way to achieve what I am required here please?

Labels (3)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

@shashank_24 

How to determine 'peak'?

There are a number of ways to look at this, but you could do a timechart and look for the count per minute rather than the total over 30 minutes, e.g.

index=myindex sourcetype=ssl_access_combined requested_content="/myapp*" NOT images status=50*
| timechart span=1m count by status
| where '500'>10

which will look at the number of 500s per minutes and only alert if there are more than 10 in a minute.

Or you could look for outliers, but this would require a bit of tuning for your your data.

index=myindex sourcetype=ssl_access_combined requested_content="/myapp*" NOT images status=500 
| bin _time span=1m
| stats count by _time
| streamstats window=10 avg(count) as avg, stdev(count) as stdev
| eval multiplier = 2
| eval lower_bound = avg - (stdev * multiplier)
| eval upper_bound = avg + (stdev * multiplier)
| eval lower_outlier = if(count < lower_bound, 1, 0)
| eval upper_outlier = if(count > upper_bound, 1, 0)
| where upper_outlier=1

 so this looks for any rolling 10 minute window where the average is 2 standard deviations above the average per minute. Play with the window/multipliers on your historical data to find a number that works for your data. I left the lower_outlier calc in their as an example of picking up values outside the lower bounds also.

Hope this helps.

0 Karma
Get Updates on the Splunk Community!

Unlock Database Monitoring with Splunk Observability Cloud

  In today’s fast-paced digital landscape, even minor database slowdowns can disrupt user experiences and ...

Purpose in Action: How Splunk Is Helping Power an Inclusive Future for All

At Cisco, purpose isn’t a tagline—it’s a commitment. Cisco’s FY25 Purpose Report outlines how the company is ...

[Upcoming Webinar] Demo Day: Transforming IT Operations with Splunk

Join us for a live Demo Day at the Cisco Store on January 21st 10:00am - 11:00am PST In the fast-paced world ...