Alerting

Find trigger when http error code increase 5% for 3 consecutive minutes

mlui_2
Explorer

Hi guys

how do create an alert trigger where the follow criteria

Error Status code 5% increase for 3 consecutive minutes report as "Warning". 5% increase for 5 consecutive minutes report as "Error"

base search is something like

index=apacheaccesslogs | fields status | timechart span=1m count by status

Thanks in advance

Tags (2)
0 Karma
1 Solution

jacobpevans
Motivator

Greetings @mlui_2,

Please take a look at these run-anywhere searches. This sounds perfect for the transpose (https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transpose) command. Yours will still use count instead of sum(count)

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=102 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=103 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=104 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=106 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", (100*('row 4' - 'row 1') / 'row 1'),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", (100*('row 6' - 'row 1') / 'row 1'),"N/A")

And here's what the full alert would look like:

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=110 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=115 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=120 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=125 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", round((100*('row 4' - 'row 1') / 'row 1'), 2),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", round((100*('row 6' - 'row 1') / 'row 1'), 2),"N/A")
| eval Alert_Type = case (Percent_Increase_5_Mins>5,"Error",
                          Percent_Increase_3_Mins>5,"Warning")
| where isnotnull(Alert_Type)
Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

View solution in original post

0 Karma

jacobpevans
Motivator

Greetings @mlui_2,

Please take a look at these run-anywhere searches. This sounds perfect for the transpose (https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transpose) command. Yours will still use count instead of sum(count)

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=102 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=103 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=104 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=106 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", (100*('row 4' - 'row 1') / 'row 1'),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", (100*('row 6' - 'row 1') / 'row 1'),"N/A")

And here's what the full alert would look like:

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=110 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=115 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=120 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=125 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", round((100*('row 4' - 'row 1') / 'row 1'), 2),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", round((100*('row 6' - 'row 1') / 'row 1'), 2),"N/A")
| eval Alert_Type = case (Percent_Increase_5_Mins>5,"Error",
                          Percent_Increase_3_Mins>5,"Warning")
| where isnotnull(Alert_Type)
Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.
0 Karma

dmarling
Builder

is your 5% increase based on each subsequent minute so it's exponential growth you are alerting on or some other aggregate?

If this comment/answer was helpful, please up vote it. Thank you.

mlui_2
Explorer

base on the requirement i got, it is based on each subsequent minute.

but this could lead to false positive alert. I'm open to suggestion on how should the alert should be.

0 Karma
Get Updates on the Splunk Community!

Upcoming Webinar: Unmasking Insider Threats with Slunk Enterprise Security’s UEBA

Join us on Wed, Dec 10. at 10AM PST / 1PM EST for a live webinar and demo with Splunk experts! Discover how ...

.conf25 technical session recap of Observability for Gen AI: Monitoring LLM ...

If you’re unfamiliar, .conf is Splunk’s premier event where the Splunk community, customers, partners, and ...

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

There’s something special about this time of year—maybe it’s the glow of the holidays, maybe it’s the ...