Alerting

Alerts not triggering, even though results are present

jonboerner
New Member

Hi,

I've been experiencing some issues with alerts triggering. I have a number of alerts (5-10), and while most of them are triggering as expected, some of them are not. I had thought that maybe I created the alerts incorrectly, but today it just so happened that our Splunk instance crashed, was restarted, and all of a sudden the alerts that had previously not been coming through started working!

Now my concern is that if that alert is working, does that mean other alerts that were previously working now aren't?

I assume this has something to do with some kind of job limit, but I haven't been able to find good information on how any such limits work, or how I can confirm this and see for sure that this is the issue, and I've been told by my teams admin that I am under whatever limits I have. I also haven't been able to find good information on how to tell if an alert job is going to alert or not based on this hypothetical limit, or how to switch which alerts will be triggered if I am over the limit.

Any thoughts? Thanks.

0 Karma
1 Solution

woodcock
Esteemed Legend

Check out this same Q&As:
https://answers.splunk.com/answers/401841/why-is-a-triggered-alert-reporting-286-events-300.html#ans...

The problem is that your alert is real-time and there is surely latency in the delivery of your events into Splunk. At the time the alert fired, some events which will eventually fall into the real-time window HAVE NOT ARRIVED INTO SPLUNK YET. This is one of MANY reasons not to ever use real-time. If you run your search every 2 minutes from -10m@m to -5m@m, you will have a nice alert that performs WAY better resource-wise and also is FAR more accurate. The amount of time to "wait" (in this case 5 minutes) depends on the latency of your events, which can be calculated like this:

... | eval lagSecs = _indextime - _time | stats min(lagSecs) max(lagSecs) avg(lagSecs) by host source

View solution in original post

woodcock
Esteemed Legend

Check out this same Q&As:
https://answers.splunk.com/answers/401841/why-is-a-triggered-alert-reporting-286-events-300.html#ans...

The problem is that your alert is real-time and there is surely latency in the delivery of your events into Splunk. At the time the alert fired, some events which will eventually fall into the real-time window HAVE NOT ARRIVED INTO SPLUNK YET. This is one of MANY reasons not to ever use real-time. If you run your search every 2 minutes from -10m@m to -5m@m, you will have a nice alert that performs WAY better resource-wise and also is FAR more accurate. The amount of time to "wait" (in this case 5 minutes) depends on the latency of your events, which can be calculated like this:

... | eval lagSecs = _indextime - _time | stats min(lagSecs) max(lagSecs) avg(lagSecs) by host source

jonboerner
New Member

I will certainly give this a try. I'm not sure I understand the flow of how events are logged, sent to splunk, and indexed, and where in that process the alerts come into play. Hopefully I will understand a bit more as I look at the latency and try a few things out.

0 Karma

woodcock
Esteemed Legend

The point is that real-time is almost always either impractical, impossible, or indeterminate for many reasons, the chief of which is latency. This search will probably show you that latency is too high for real-time. If you are watching a live 5-minute window but many/most of your events take longer than 5 minutes to get into Splunk, then you do not have enough in your window to use.

jonboerner
New Member

Ah I think I'm getting a better understanding here. For one of my failing alerts, the events that compose the alert occur on average 30 minutes apart from each other, so a 5 minute window obviously would not cover that case. I had not realized that the window sets all of the events that are searched over for the alert. This had not occurred to me before.

I wouldn't be surprised if this was the case for all my alerts that are failing.

Thanks!

0 Karma

somesoni2
Revered Legend

Can you provide more details about the alerts, their search time range, their cron schedule etc?

0 Karma

jonboerner
New Member

Absolutely I can.

These are all real-time alerts and most of the queries are pretty simple. 2 of the jobs have "complex" queries, and one of those 2 has a pretty big inner join in it.

Let me know what other info might be helpful here and I'd be happy to provide it as well.

0 Karma
Get Updates on the Splunk Community!

How I Instrumented a Rust Application Without Knowing Rust

As a technical writer, I often have to edit or create code snippets for Splunk's distributions of ...

Splunk Community Platform Survey

Hey Splunk Community, Starting today, the community platform may prompt you to participate in a survey. The ...

Observability Highlights | November 2022 Newsletter

 November 2022Observability CloudEnd Of Support Extension for SignalFx Smart AgentSplunk is extending the End ...