Alerting

Find trigger when http error code increase 5% for 3 consecutive minutes

mlui_2
Explorer

Hi guys

how do create an alert trigger where the follow criteria

Error Status code 5% increase for 3 consecutive minutes report as "Warning". 5% increase for 5 consecutive minutes report as "Error"

base search is something like

index=apacheaccesslogs | fields status | timechart span=1m count by status

Thanks in advance

Tags (2)
0 Karma
1 Solution

jacobpevans
Motivator

Greetings @mlui_2,

Please take a look at these run-anywhere searches. This sounds perfect for the transpose (https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transpose) command. Yours will still use count instead of sum(count)

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=102 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=103 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=104 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=106 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", (100*('row 4' - 'row 1') / 'row 1'),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", (100*('row 6' - 'row 1') / 'row 1'),"N/A")

And here's what the full alert would look like:

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=110 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=115 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=120 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=125 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", round((100*('row 4' - 'row 1') / 'row 1'), 2),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", round((100*('row 6' - 'row 1') / 'row 1'), 2),"N/A")
| eval Alert_Type = case (Percent_Increase_5_Mins>5,"Error",
                          Percent_Increase_3_Mins>5,"Warning")
| where isnotnull(Alert_Type)
Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

View solution in original post

0 Karma

jacobpevans
Motivator

Greetings @mlui_2,

Please take a look at these run-anywhere searches. This sounds perfect for the transpose (https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transpose) command. Yours will still use count instead of sum(count)

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=102 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=103 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=104 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=106 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", (100*('row 4' - 'row 1') / 'row 1'),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", (100*('row 6' - 'row 1') / 'row 1'),"N/A")

And here's what the full alert would look like:

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=110 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=115 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=120 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=125 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", round((100*('row 4' - 'row 1') / 'row 1'), 2),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", round((100*('row 6' - 'row 1') / 'row 1'), 2),"N/A")
| eval Alert_Type = case (Percent_Increase_5_Mins>5,"Error",
                          Percent_Increase_3_Mins>5,"Warning")
| where isnotnull(Alert_Type)
Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.
0 Karma

dmarling
Builder

is your 5% increase based on each subsequent minute so it's exponential growth you are alerting on or some other aggregate?

If this comment/answer was helpful, please up vote it. Thank you.

mlui_2
Explorer

base on the requirement i got, it is based on each subsequent minute.

but this could lead to false positive alert. I'm open to suggestion on how should the alert should be.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...