Splunk Search
Highlighted

Help creating/tweaking alert for spike in errors

Motivator

We had an issue come up this morning where we all of a sudden had a HUGE spike in one type of error in our error logs - it normally has 1-100 of these a day but in the first hour this morning we had over 1000 . We noticed it visually on one of our dashboards so we could jump in and address it quickly. Yay for that. However, we had to find it visually because at the moment I don't have an alert that is doing what I'd like it to.

I have an alert set up to look for standard deviation. However it seems that no matter how I tweak it I either get so many alerts it's not useful or I don't get the alerts I need.

Here's my current standard deviation search for the alert:

index=ecm sourcetype="ibm:was:system" host=PRDFNCM CIWEB AND Error AND "Exception" NOT "CIWEB.Plugin" | rex field=_raw ".(?\w*?Exception)" | bucket _time span=1d | stats count BY _time ExceptionName | eventstats stdev(count) as stdev BY ExceptionName| where count > (3 * stdev)

This morning the standard deviation was calculated as 477.73 and the count was 1377 so it didn't alert.

Since a normal day for this error is less than 100 it seems to me like the standard deviation is off but I don't know how to fix it.

Any help or advice would be much appreciated.

0 Karma
Highlighted

Re: Help creating/tweaking alert for spike in errors

Builder

We have something like this, specifically with HTTP 500 errors. We get around 50 or so an hour normally. So, I setup the alert to simply search for 500's, stats, and add totals and e-mail if they are over 75.

index=application (host=TTAPPPEGACC*) sourcetype="apollo:prod:tomcat_access" httpcode=500 
|eval host=upper(host) 
|stats count by host 
|addtotals col=true

I then setup the alert screen as shown.

alt text

0 Karma
Highlighted

Re: Help creating/tweaking alert for spike in errors

Motivator

I started with something that and it works nicely if you know what you're looking for. The problem is we're monitoring an unknown number of errors and they all have different 'normal' thresholds. I'm trying not to have to hard code everything and update the alert every week.

0 Karma
Highlighted

Re: Help creating/tweaking alert for spike in errors

Path Finder

Did you get what you were trying to do?

0 Karma
Speak Up for Splunk Careers!

We want to better understand the impact Splunk experience and expertise has has on individuals' careers, and help highlight the growing demand for Splunk skills.