I am in the process of setting up a proof of concept Splunk environment that will replace our current alerting system. We currently use a combination of syslog and swatch (syslog watcher) to alert on error codes across our applications (via email to a number of different recipients depending on the alert). We have about 15 different applications that can generate a total of about 900 unique alert codes. One of the main issues with our current system is that it cannot do any velocity checking on alerts (i.e. Only alert if there are 3 ERR_101 alerts in a set amount of time.
I can achieve the above if I take a small subset of the error codes and set up an alert with the trigger being number of occurrences per x minutes.
The problem is that when I try to scale this up it gets very bloated, hard to manage and ends up with several different real-time searches (which would affect performance).
I want to build (without re-inventing the wheel too much) something that will allow me to tune the email recipient for each alert and also the number of occurrences within a configurable time-frame to alert on.
Taking the example from the table below ERR_002 - If there are 3 occurrences of this error in 60 minutes an email will be sent to firstname.lastname@example.org.
I am not looking for a complete answer to this problem, just a bit of guidance into how I would go about achieving this within Splunk. I have investigated lookup tables but have been unable to use values in the table to customise the alert.
Anything guidance help would must much appreciated.
And If you want each result to have it's own destination, you could add a layer to your search to append the email destination to the events (with a lookup or some eval logic). Then have your script to iterate per line of result.
Remark : I do not recommend to use too many realtime alerts, keep them for really urgent alerts.
For anything else, use scheduled alerts (with a delay for accounting on the indexing latency).