I originally posted this because our alerts weren't working, and I wanted to confirm the syntax for multiple recipients. It seems that our alerts still aren't working (not getting email notification or showing in the alert manager). One of the comments posted in the other question was that alltime realtime (rt / rt) alerts should not be configured, and we had a number of them. So what is the best way to configure real-time searches then? Our use-case is that we want to be notified as soon as certain events occur.
I went in to all the "rt rt" searches, and changed them to "rt-1m / rt-0m" time frames, with condition "always" and alert mode "per-result" with some relevant field throttling, but after running some tests, we're not getting the notifications as expected.
I'm considering combining all of our rt/rt searches into 1 monster query (we had about 15 odd searches) with the use of ()'s and ANDs / ORs, so that one search matches all (although identifying which condition triggered it by subject will be a nightmare, unless we have some crazy eval + case to inject a label).
What is the best approach for configuring searches to notify email addresses as certain events occur?
I don't think it's at the SMTP level, because i have tracking enabled, and the alltime / realtime (rt rt) searches weren't even showing in the alert manager.
The alert condition SHOULD match an event - if I open the search from the "Searches and Reports" drop down, then I can see the events showing. However, its something to do with rt/rt config that seems to be breaking it. I've been fiddling around, but am busy configuring a specific test case to check what happens.
It doesn't depend on whether you have 15/20 realtime searches, it's about how it's configured.
Are you getting any mail for any of the configured alert?
If not these are the possible cause may happen:
The sendmail.py file which sends the mail may be corrupt.
The alert condition doesn't match any event.
The throttling parameter is not the actual field name
The SMTP server is not configured correctly.
simple way to test from search app:
...| sendemail email@example.com server=smtp_server sendresults=true format=html inline=true
test it under http://server:8000/en-US/app/App/flashtimeline
out of the 15 searches, it depends on the search. For example, we have one that has:
For example, one alert that we want to be notified when a user of our application triggers a certain condition has "Once per result" with throttling of 1 hour based on UserID.
However, we have another alert that monitors logs from the application to the database. We don't want to throttle this event though, every time the application has an error connecting to the database, we want it to email us. We currently have rt-1h to rt-0 with condition of "number of events" > 0 and 1 hour throttling based on "host"
I would like to know the search and the throttling parameters. The real time alerts work fine, i had struggled with it but i got it worked with precision. So do explain with the search and condition so that we can look at. Probably you can show us in the image.
Splunk is not the ideal tool for literal "real-time" alerting. If you need truly real-time alerting you need a real-time monitoring platform (Nagios or similar under Linux, for instance). That said unless you are using something of the ilk of SNMP traps to initiate alerts, nothing is ever truly real time, as you are inevitably relying on a regular polling of whatever conditional semaphores you are monitoring, even if that polling is something like once a second.
The best you can really achieve with Splunk is regular searches running at short intervals over short time spans (e.g. scheduled to run every minute, and only cover a span of a minute - or possibly two just to ensure overlap and that nothing falls through the cracks).
Really, it comes down to just how instantaneous you need your alert to be. After all, if you are relying on e-mail alerts you could conceivably fall foul of delivery delays.
Here's a thought: you could consider integrate Splunk with Nagios passive checks and rely on that engine to handle the actual alerting. I have not done it myself, but I know it has been done.
We originally used nagios / zabbix as our monitoring system. Those tools are great for OS / platform monitoring (although the *nix app works pretty well in splunk too). We've tried to consolidate our logging in splunk (instead of managing more than 1 app) - so for now, we are looking to get close to real time monitoring. By "close", i mean notified within a minute or 2 (immediate not necessary).
so with that in mind, use a -2m / 0 range scheduled to run every minute, with a 1 minute suppression based on a unique field (e.g. a run id for a job) would be a good approach?