Splunk Search

Splunk alert for peak errors only

shashank_24
Path Finder

Hi,  I am working on a query to write an alert where i need to monitor few pages for 500 Errors. Now currently there are some investigations going on to fix those errors so we usually get few every 30 minutes.

So basically at the moment not every 5xx is a problem and I know it's hard for us to see, which of those are "real problems" and which not.

What I want to do is something like adjust our thresholds in the Splunk search so that it only recognise peaks, which are surely something "going wrong" and then trigger the alert.

I have a alert like this at the moment which is triggered every 30 minutes.

index=myindex sourcetype=ssl_access_combined requested_content="/myapp*" NOT images status=50* 
| stats count by status
| where (status=500 AND count > 10)

 

Right now it gets triggered when the count of 500 error is 10 in last 30 minutes but I don't want that as it is unnecessary flooding the mailbox. 

Is there a way to achieve what I am required here please?

Labels (3)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

@shashank_24 

How to determine 'peak'?

There are a number of ways to look at this, but you could do a timechart and look for the count per minute rather than the total over 30 minutes, e.g.

index=myindex sourcetype=ssl_access_combined requested_content="/myapp*" NOT images status=50*
| timechart span=1m count by status
| where '500'>10

which will look at the number of 500s per minutes and only alert if there are more than 10 in a minute.

Or you could look for outliers, but this would require a bit of tuning for your your data.

index=myindex sourcetype=ssl_access_combined requested_content="/myapp*" NOT images status=500 
| bin _time span=1m
| stats count by _time
| streamstats window=10 avg(count) as avg, stdev(count) as stdev
| eval multiplier = 2
| eval lower_bound = avg - (stdev * multiplier)
| eval upper_bound = avg + (stdev * multiplier)
| eval lower_outlier = if(count < lower_bound, 1, 0)
| eval upper_outlier = if(count > upper_bound, 1, 0)
| where upper_outlier=1

 so this looks for any rolling 10 minute window where the average is 2 standard deviations above the average per minute. Play with the window/multipliers on your historical data to find a number that works for your data. I left the lower_outlier calc in their as an example of picking up values outside the lower bounds also.

Hope this helps.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Kick the Tires Before You Commit: A Hands-On Tour of the Splunk Observability Cloud ...

Evaluating an enterprise observability platform usually goes like this: fill out a form, get a free trial with ...

Deep insights, no barriers: Splunk Observability Cloud Free Edition

As software delivery cycles continue to accelerate, observability shouldn’t be a luxury — it should be a ...

Monitoring AI Agents with Splunk Observability Cloud

Let’s say I’m running a travel planning AI app in production. A user asks for three concise hotel options in ...