Alerting

Find trigger when http error code increase 5% for 3 consecutive minutes

mlui_2
Explorer

Hi guys

how do create an alert trigger where the follow criteria

Error Status code 5% increase for 3 consecutive minutes report as "Warning". 5% increase for 5 consecutive minutes report as "Error"

base search is something like

index=apacheaccesslogs | fields status | timechart span=1m count by status

Thanks in advance

Tags (2)
0 Karma
1 Solution

jacobpevans
Motivator

Greetings @mlui_2,

Please take a look at these run-anywhere searches. This sounds perfect for the transpose (https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transpose) command. Yours will still use count instead of sum(count)

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=102 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=103 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=104 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=106 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", (100*('row 4' - 'row 1') / 'row 1'),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", (100*('row 6' - 'row 1') / 'row 1'),"N/A")

And here's what the full alert would look like:

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=110 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=115 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=120 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=125 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", round((100*('row 4' - 'row 1') / 'row 1'), 2),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", round((100*('row 6' - 'row 1') / 'row 1'), 2),"N/A")
| eval Alert_Type = case (Percent_Increase_5_Mins>5,"Error",
                          Percent_Increase_3_Mins>5,"Warning")
| where isnotnull(Alert_Type)
Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.

View solution in original post

0 Karma

jacobpevans
Motivator

Greetings @mlui_2,

Please take a look at these run-anywhere searches. This sounds perfect for the transpose (https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Transpose) command. Yours will still use count instead of sum(count)

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=102 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=103 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=104 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=106 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", (100*('row 4' - 'row 1') / 'row 1'),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", (100*('row 6' - 'row 1') / 'row 1'),"N/A")

And here's what the full alert would look like:

           | makeresults | eval _time=now()-(60*5), status="Error", count=101
| append [ | makeresults | eval _time=now()-(60*4), status="Error", count=105 ]
| append [ | makeresults | eval _time=now()-(60*3), status="Error", count=110 ]
| append [ | makeresults | eval _time=now()-(60*2), status="Error", count=115 ]
| append [ | makeresults | eval _time=now()-(60*1), status="Error", count=120 ]
| append [ | makeresults | eval _time=now()-(60*0), status="Error", count=125 ]
| timechart span=1m sum(count) by status
| convert ctime(_time)
| transpose 0
| eval Percent_Increase_3_Mins = if(column="Error", round((100*('row 4' - 'row 1') / 'row 1'), 2),"N/A")
| eval Percent_Increase_5_Mins = if(column="Error", round((100*('row 6' - 'row 1') / 'row 1'), 2),"N/A")
| eval Alert_Type = case (Percent_Increase_5_Mins>5,"Error",
                          Percent_Increase_3_Mins>5,"Warning")
| where isnotnull(Alert_Type)
Cheers,
Jacob

If you feel this response answered your question, please do not forget to mark it as such. If it did not, but you do have the answer, feel free to answer your own post and accept that as the answer.
0 Karma

dmarling
Builder

is your 5% increase based on each subsequent minute so it's exponential growth you are alerting on or some other aggregate?

If this comment/answer was helpful, please up vote it. Thank you.

mlui_2
Explorer

base on the requirement i got, it is based on each subsequent minute.

but this could lead to false positive alert. I'm open to suggestion on how should the alert should be.

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...