Splunk Search

Splunk alert for peak errors only

shashank_24
Path Finder

Hi,  I am working on a query to write an alert where i need to monitor few pages for 500 Errors. Now currently there are some investigations going on to fix those errors so we usually get few every 30 minutes.

So basically at the moment not every 5xx is a problem and I know it's hard for us to see, which of those are "real problems" and which not.

What I want to do is something like adjust our thresholds in the Splunk search so that it only recognise peaks, which are surely something "going wrong" and then trigger the alert.

I have a alert like this at the moment which is triggered every 30 minutes.

index=myindex sourcetype=ssl_access_combined requested_content="/myapp*" NOT images status=50* 
| stats count by status
| where (status=500 AND count > 10)

 

Right now it gets triggered when the count of 500 error is 10 in last 30 minutes but I don't want that as it is unnecessary flooding the mailbox. 

Is there a way to achieve what I am required here please?

Labels (3)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

@shashank_24 

How to determine 'peak'?

There are a number of ways to look at this, but you could do a timechart and look for the count per minute rather than the total over 30 minutes, e.g.

index=myindex sourcetype=ssl_access_combined requested_content="/myapp*" NOT images status=50*
| timechart span=1m count by status
| where '500'>10

which will look at the number of 500s per minutes and only alert if there are more than 10 in a minute.

Or you could look for outliers, but this would require a bit of tuning for your your data.

index=myindex sourcetype=ssl_access_combined requested_content="/myapp*" NOT images status=500 
| bin _time span=1m
| stats count by _time
| streamstats window=10 avg(count) as avg, stdev(count) as stdev
| eval multiplier = 2
| eval lower_bound = avg - (stdev * multiplier)
| eval upper_bound = avg + (stdev * multiplier)
| eval lower_outlier = if(count < lower_bound, 1, 0)
| eval upper_outlier = if(count > upper_bound, 1, 0)
| where upper_outlier=1

 so this looks for any rolling 10 minute window where the average is 2 standard deviations above the average per minute. Play with the window/multipliers on your historical data to find a number that works for your data. I left the lower_outlier calc in their as an example of picking up values outside the lower bounds also.

Hope this helps.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...