Splunk Search

Splunk alert for peak errors only

shashank_24
Path Finder

Hi,  I am working on a query to write an alert where i need to monitor few pages for 500 Errors. Now currently there are some investigations going on to fix those errors so we usually get few every 30 minutes.

So basically at the moment not every 5xx is a problem and I know it's hard for us to see, which of those are "real problems" and which not.

What I want to do is something like adjust our thresholds in the Splunk search so that it only recognise peaks, which are surely something "going wrong" and then trigger the alert.

I have a alert like this at the moment which is triggered every 30 minutes.

index=myindex sourcetype=ssl_access_combined requested_content="/myapp*" NOT images status=50* 
| stats count by status
| where (status=500 AND count > 10)

 

Right now it gets triggered when the count of 500 error is 10 in last 30 minutes but I don't want that as it is unnecessary flooding the mailbox. 

Is there a way to achieve what I am required here please?

Labels (3)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

@shashank_24 

How to determine 'peak'?

There are a number of ways to look at this, but you could do a timechart and look for the count per minute rather than the total over 30 minutes, e.g.

index=myindex sourcetype=ssl_access_combined requested_content="/myapp*" NOT images status=50*
| timechart span=1m count by status
| where '500'>10

which will look at the number of 500s per minutes and only alert if there are more than 10 in a minute.

Or you could look for outliers, but this would require a bit of tuning for your your data.

index=myindex sourcetype=ssl_access_combined requested_content="/myapp*" NOT images status=500 
| bin _time span=1m
| stats count by _time
| streamstats window=10 avg(count) as avg, stdev(count) as stdev
| eval multiplier = 2
| eval lower_bound = avg - (stdev * multiplier)
| eval upper_bound = avg + (stdev * multiplier)
| eval lower_outlier = if(count < lower_bound, 1, 0)
| eval upper_outlier = if(count > upper_bound, 1, 0)
| where upper_outlier=1

 so this looks for any rolling 10 minute window where the average is 2 standard deviations above the average per minute. Play with the window/multipliers on your historical data to find a number that works for your data. I left the lower_outlier calc in their as an example of picking up values outside the lower bounds also.

Hope this helps.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...