Splunk Search

Help creating/tweaking alert for spike in errors

kmaron
Motivator

We had an issue come up this morning where we all of a sudden had a HUGE spike in one type of error in our error logs - it normally has 1-100 of these a day but in the first hour this morning we had over 1000 . We noticed it visually on one of our dashboards so we could jump in and address it quickly. Yay for that. However, we had to find it visually because at the moment I don't have an alert that is doing what I'd like it to.

I have an alert set up to look for standard deviation. However it seems that no matter how I tweak it I either get so many alerts it's not useful or I don't get the alerts I need.

Here's my current standard deviation search for the alert:

index=ecm sourcetype="ibm:was:system*" host=PRDFNCM* CIWEB AND Error AND "Exception" NOT "CIWEB.*Plugin" | rex field=_raw ".(?\w?Exception)" | bucket _time span=1d | stats count BY _time ExceptionName | eventstats stdev(count) as stdev BY ExceptionName| where count > (3 * stdev)

This morning the standard deviation was calculated as 477.73 and the count was 1377 so it didn't alert.

Since a normal day for this error is less than 100 it seems to me like the standard deviation is off but I don't know how to fix it.

Any help or advice would be much appreciated.

0 Karma

JDukeSplunk
Builder

We have something like this, specifically with HTTP 500 errors. We get around 50 or so an hour normally. So, I setup the alert to simply search for 500's, stats, and add totals and e-mail if they are over 75.

index=application (host=TTAPPPEGACC*) sourcetype="apollo:prod:tomcat_access" httpcode=500 
|eval host=upper(host) 
|stats count by host 
|addtotals col=true

I then setup the alert screen as shown.

alt text

0 Karma

kmaron
Motivator

I started with something that and it works nicely if you know what you're looking for. The problem is we're monitoring an unknown number of errors and they all have different 'normal' thresholds. I'm trying not to have to hard code everything and update the alert every week.

0 Karma

ksharma7
Path Finder

Did you get what you were trying to do?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...