Alerting

How to build an ongoing alert that catches a sudden rise (spike) in a certain error code?

gingersoftware
New Member

Hi Guys,

I could really use an ongoing alert that catches a sudden rise (spike) in a certain error code (such as 404 or 502 etc...)
I tried giving this some thought on how to achieve that, and... Well... I could really use your help 🙂

From my understanding the search query should "know" or, "sense" the normal traffic (not sure for how long, maybe for 1hr, 2hrs) and alert when there is a spike in the error code compared to 1-2 hours ago.
I think the error code spike threshold should be more than 5% of total traffic, while occurring for longer than 90 seconds.

I appreciate your help.

Tags (1)
0 Karma
1 Solution

HiroshiSatoh
Champion

I use predictions when I create alerts by statistical analysis. I think it is easier to adjust the prediction parameters according to the current situation rather than thinking about various logic.

index=(your index) ("404" OR "502" OR ・・・)
| timechart span=90s count 
| predict lower95=lower upper95=upper algorithm=LL count as predict
| where count>'upper(predict)'

※Adjustment point:span=90s、upper95、time range、(algorithm)

View solution in original post

Anam
Community Manager
Community Manager

Hi @gingersoftware

My name is Anam Siddique and I am the Community Content Specialist for Splunk Answers. Please accept the appropriate answer that worked for you so other members of the community can benefit from it. If none of the answers have worked for you so far please post further comments so someone can help you.

Thanks

0 Karma

felipesewaybric
Contributor

Timewrap will do the trick.

0 Karma

woodcock
Esteemed Legend

Check out this INCREDIBLE answer by @mmodestino here:

https://answers.splunk.com/answers/511894/how-to-use-the-timewrap-command-and-set-an-alert-f.html

I heard that he was going to create a blog post or app based on this, what is the evolution of this answer, @mmodestino?

0 Karma

HiroshiSatoh
Champion

I use predictions when I create alerts by statistical analysis. I think it is easier to adjust the prediction parameters according to the current situation rather than thinking about various logic.

index=(your index) ("404" OR "502" OR ・・・)
| timechart span=90s count 
| predict lower95=lower upper95=upper algorithm=LL count as predict
| where count>'upper(predict)'

※Adjustment point:span=90s、upper95、time range、(algorithm)

gingersoftware
New Member

Thanks,

Could you help me modify this script to fit your description?

tag=NginxLogs host=www1 OR host=www2 |stats count by status|eventstats sum(count) as total|eval perc=round((count/total)*100,2)|where status="404" AND perc>5

Thanks

0 Karma

Noah_Woodcock
Path Finder

predictions are the way to go.

0 Karma

HiroshiSatoh
Champion

For example, it is like this.

tag=NginxLogs host=www1 OR host=www2
|timechart span=1h count as total,count(eval(status="401")) as count
|eval perc=round((count/total)*100,2)
|fields - count,total
|predict lower95=lower upper95=upper algorithm=LL perc as predict
|where perc>'upper(predict)'

As it is a sample, please change the parameters in the actual environment and try it.
If you delete the WHERE clause, you can check it on the graph.

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...