I'm trying to trigger a scripted action based on specific Windows services starting and stopping. I setup a realtime savedsearch to detect the condition, trigger my scripted action, and send me an email. The problem is that when the condition occurs, I get pummeled with emails and the triggered action gets launched repeatedly (several times a minute). The same event is alerted upon multiple times.
The realtime saved search is configured to accept anything from the past 10 minutes, i.e. rt-10m
. During a reboot, it's possible that a monitored Windows service could be shutdown after splunkd
and therefore the shutdown event wouldn't be forwarded until the system is back up.) The alert action gets triggered repeatedly for the same event during that 10 minute window.
On similar questions, others have suggested using the alert throttle. But that effectively disables the alert temporarily which also prevent new events from being seen. For example, if the service comes back online in under 10 minutes the "service-up" event would be suppressed by the throttle. Loosing events isn't acceptable in this use case.
I need something like a trigger-level dedup
that just gives me a single copy of each event!
Note: I'm currently running Splunk 4.2, but I've heard about "Per-result alerting" in 4.3. I'm not sure that helps me here, but any feedback regarding this is welcomed.
If you aren't still on splunk 4.2, per-result alerting now allows you to suppress based on a field.
So you could suppress based on service - i.e. suppressing alerts on one service needn't mean suppressing alerts on other services.
See http://docs.splunk.com/Documentation/Splunk/5.0.4/Alert/Defineper-resultalerts