I have an event being received once every 2 minutes. I am trying to setup an alert if the Value for the event goes beyond certain threshold for 15 mins or more. I am using the below query.
index= x host = y |Where Value > Threshold |sort _time |bin _time span = 16m | stats count by host _time |Where count > 6 |Eval count = count *2
Does the above code need any changes to work.
Thanks in advance
Some minor edits:
index= x host = y Value > Threshold
# moved Value > Threshold up, you also probably want to filter to a very specific set of logs
# why do you need the sort? Logs are already sorted _time descending by default
| bin _time span = 16m
| stats count by host, _time
# added a comma for readability
|where count > 7
# shouldn't this be 7? you'd want all 8 2 minute chunks to be above the threshold
|eval count = count *2
# why do you need this line?
There are some other ways to do this (grabbing the earliest time of exceeded value, latest time, taking the diff). I would also urge you to get comfortable testing your alerts, in this case by lowering the threshold and seeing if, for example, a threshold of 0 returns the complete result set of all the hosts you would expect to see.
Hope this helps!
Thank you for the help.
How would i go about grabbing the earliest time of exceeded Value and the latest time for the exceeded value and taking the difference.
|sort _time # this to make it easier for the application team to read the logs when they open the alert so that all the events are in ascending order. |bin _time span = 16m | stats count by host _time |Where count > 7 # Yeah Should be 7 |Eval count = count *2 # only to display the number of minutes the value was above the threshold | rename count AS "Minutes Over Threshold" host as Host