Solved: Alerts not triggering, even though results are pre...

jonboerner · ‎06-20-2016

Hi,

I've been experiencing some issues with alerts triggering. I have a number of alerts (5-10), and while most of them are triggering as expected, some of them are not. I had thought that maybe I created the alerts incorrectly, but today it just so happened that our Splunk instance crashed, was restarted, and all of a sudden the alerts that had previously not been coming through started working!

Now my concern is that if that alert is working, does that mean other alerts that were previously working now aren't?

I assume this has something to do with some kind of job limit, but I haven't been able to find good information on how any such limits work, or how I can confirm this and see for sure that this is the issue, and I've been told by my teams admin that I am under whatever limits I have. I also haven't been able to find good information on how to tell if an alert job is going to alert or not based on this hypothetical limit, or how to switch which alerts will be triggered if I am over the limit.

Any thoughts? Thanks.

woodcock · ‎06-21-2016

Check out this same Q&As:
https://answers.splunk.com/answers/401841/why-is-a-triggered-alert-reporting-286-events-300.html#ans...

The problem is that your alert is real-time and there is surely latency in the delivery of your events into Splunk. At the time the alert fired, some events which will eventually fall into the real-time window HAVE NOT ARRIVED INTO SPLUNK YET. This is one of MANY reasons not to ever use real-time. If you run your search every 2 minutes from -10m@m to -5m@m, you will have a nice alert that performs WAY better resource-wise and also is FAR more accurate. The amount of time to "wait" (in this case 5 minutes) depends on the latency of your events, which can be calculated like this:

... | eval lagSecs = _indextime - _time | stats min(lagSecs) max(lagSecs) avg(lagSecs) by host source

View solution in original post

woodcock · ‎06-21-2016

Check out this same Q&As:
https://answers.splunk.com/answers/401841/why-is-a-triggered-alert-reporting-286-events-300.html#ans...

The problem is that your alert is real-time and there is surely latency in the delivery of your events into Splunk. At the time the alert fired, some events which will eventually fall into the real-time window HAVE NOT ARRIVED INTO SPLUNK YET. This is one of MANY reasons not to ever use real-time. If you run your search every 2 minutes from -10m@m to -5m@m, you will have a nice alert that performs WAY better resource-wise and also is FAR more accurate. The amount of time to "wait" (in this case 5 minutes) depends on the latency of your events, which can be calculated like this:

... | eval lagSecs = _indextime - _time | stats min(lagSecs) max(lagSecs) avg(lagSecs) by host source

jonboerner · ‎06-21-2016

I will certainly give this a try. I'm not sure I understand the flow of how events are logged, sent to splunk, and indexed, and where in that process the alerts come into play. Hopefully I will understand a bit more as I look at the latency and try a few things out.

woodcock · ‎06-21-2016

The point is that real-time is almost always either impractical, impossible, or indeterminate for many reasons, the chief of which is latency. This search will probably show you that latency is too high for real-time. If you are watching a live 5-minute window but many/most of your events take longer than 5 minutes to get into Splunk, then you do not have enough in your window to use.

jonboerner · ‎06-21-2016

Ah I think I'm getting a better understanding here. For one of my failing alerts, the events that compose the alert occur on average 30 minutes apart from each other, so a 5 minute window obviously would not cover that case. I had not realized that the window sets all of the events that are searched over for the alert. This had not occurred to me before.

I wouldn't be surprised if this was the case for all my alerts that are failing.

Thanks!

somesoni2 · ‎06-20-2016

Can you provide more details about the alerts, their search time range, their cron schedule etc?

jonboerner · ‎06-20-2016

Absolutely I can.

These are all real-time alerts and most of the queries are pretty simple. 2 of the jobs have "complex" queries, and one of those 2 has a pretty big inner join in it.

Let me know what other info might be helpful here and I'd be happy to provide it as well.

Alerts not triggering, even though results are present

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

What’s New in Splunk Observability – September 2025

Fun with Regular Expression - multiples of nine

Are you a member of the Splunk Community?

Alerts not triggering, even though results are present

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

What’s New in Splunk Observability – September 2025

Fun with Regular Expression - multiples of nine