Alerting

Find trigger when http error code increase 5% for 3 consecutive minutes

mlui_2
Explorer

Hi guys

how do create an alert trigger where the follow criteria

Error Status code 5% increase for 3 consecutive minutes report as "Warning". 5% increase for 5 consecutive minutes report as "Error"

base search is something like

index=apacheaccesslogs | fields status | timechart span=1m count by status

Thanks in advance

Tags (2)
0 Karma
1 Solution

jacobpevans
Motivator

Greetings @mlui_2,

Please take a look at these run-anywhere searches. This sounds perfect for the transpose (https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transpose) command. Yours will still use count instead of sum(count)

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=102 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=103 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=104 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=106 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", (100*('row 4' - 'row 1') / 'row 1'),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", (100*('row 6' - 'row 1') / 'row 1'),"N/A")

And here's what the full alert would look like:

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=110 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=115 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=120 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=125 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", round((100*('row 4' - 'row 1') / 'row 1'), 2),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", round((100*('row 6' - 'row 1') / 'row 1'), 2),"N/A")
| eval Alert_Type = case (Percent_Increase_5_Mins>5,"Error",
                          Percent_Increase_3_Mins>5,"Warning")
| where isnotnull(Alert_Type)
Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

View solution in original post

0 Karma

jacobpevans
Motivator

Greetings @mlui_2,

Please take a look at these run-anywhere searches. This sounds perfect for the transpose (https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transpose) command. Yours will still use count instead of sum(count)

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=102 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=103 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=104 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=106 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", (100*('row 4' - 'row 1') / 'row 1'),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", (100*('row 6' - 'row 1') / 'row 1'),"N/A")

And here's what the full alert would look like:

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=110 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=115 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=120 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=125 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", round((100*('row 4' - 'row 1') / 'row 1'), 2),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", round((100*('row 6' - 'row 1') / 'row 1'), 2),"N/A")
| eval Alert_Type = case (Percent_Increase_5_Mins>5,"Error",
                          Percent_Increase_3_Mins>5,"Warning")
| where isnotnull(Alert_Type)
Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

View solution in original post

0 Karma

dmarling
Builder

is your 5% increase based on each subsequent minute so it's exponential growth you are alerting on or some other aggregate?

If this comment/answer was helpful, please up vote it. Thank you.

mlui_2
Explorer

base on the requirement i got, it is based on each subsequent minute.

but this could lead to false positive alert. I'm open to suggestion on how should the alert should be.

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!