Splunk Search

Splunk alert for peak errors only

shashank_24
Path Finder

Hi,  I am working on a query to write an alert where i need to monitor few pages for 500 Errors. Now currently there are some investigations going on to fix those errors so we usually get few every 30 minutes.

So basically at the moment not every 5xx is a problem and I know it's hard for us to see, which of those are "real problems" and which not.

What I want to do is something like adjust our thresholds in the Splunk search so that it only recognise peaks, which are surely something "going wrong" and then trigger the alert.

I have a alert like this at the moment which is triggered every 30 minutes.

index=myindex sourcetype=ssl_access_combined requested_content="/myapp*" NOT images status=50* 
| stats count by status
| where (status=500 AND count > 10)

 

Right now it gets triggered when the count of 500 error is 10 in last 30 minutes but I don't want that as it is unnecessary flooding the mailbox. 

Is there a way to achieve what I am required here please?

Labels (3)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

@shashank_24 

How to determine 'peak'?

There are a number of ways to look at this, but you could do a timechart and look for the count per minute rather than the total over 30 minutes, e.g.

index=myindex sourcetype=ssl_access_combined requested_content="/myapp*" NOT images status=50*
| timechart span=1m count by status
| where '500'>10

which will look at the number of 500s per minutes and only alert if there are more than 10 in a minute.

Or you could look for outliers, but this would require a bit of tuning for your your data.

index=myindex sourcetype=ssl_access_combined requested_content="/myapp*" NOT images status=500 
| bin _time span=1m
| stats count by _time
| streamstats window=10 avg(count) as avg, stdev(count) as stdev
| eval multiplier = 2
| eval lower_bound = avg - (stdev * multiplier)
| eval upper_bound = avg + (stdev * multiplier)
| eval lower_outlier = if(count < lower_bound, 1, 0)
| eval upper_outlier = if(count > upper_bound, 1, 0)
| where upper_outlier=1

 so this looks for any rolling 10 minute window where the average is 2 standard deviations above the average per minute. Play with the window/multipliers on your historical data to find a number that works for your data. I left the lower_outlier calc in their as an example of picking up values outside the lower bounds also.

Hope this helps.

0 Karma
Get Updates on the Splunk Community!

CX Day is Coming!

Customer Experience (CX) Day is on October 7th!! We're so excited to bring back another day full of wonderful ...

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...