I have a customer who his trying to port over to Splunk from their "home grown" scripting. The request goes like this:
Client wants the first event to be sent to us real time. Then the 1 hour clock would start for this one event/one source. If the same event from the same source would then come in again within the next hour, the system would be smart enough to know and not send another alert. If the same event from same device comes in after the initial one hour, it will send another alert and the timers start all over again.
I know this sounds goofy, but thought I would ask the brighter bulbs on the tree for insight and whether the newer versions of Splunk (4.1.4) can do this.
Thoughts? Rolled eyes...Shoulder shrugs...are all ok.
At the user conference in San. Fran they discussed 4.2 (comes out early next year) to have alert suppression.
Now as to how to do this in 4.1.5, I've been racking my brain quite a bit, the problem becomes the stagnate "hour". There's no real way for splunk to remember that I already sent out this alert without storing the data in a summary index. So my idea would to be setup a search that runs every 5 min, and assign a decimal value to the search in a new field, if the event shows up, and increment field and then store it in a summary index. Then I'd run my actual alert off the summary index, and if the field came out to a value of 1 increment, or of an hours worth, to send off an alert. So if that's not convoluted enough, I'll try to explain how to do it >.>
So I setup a new saved search that runs every 5 minutes. In this saved search I'm going to look for the following:
search Error | stats count as ErrorCount | eval ErrorThreshold= if(ErrorCount>=1,".84","0")
and throw it into a summary index. Then I would run a search every 5 and looks back over the hour on the summary index that looks something like:
index=summary search_name=5min_saved_search | stats sum(ErrorThreshold)
and alert if it = .84 or 1 (select conditional and search ErrorThreshold=.84 OR ErrorThreshold>=1)
Issue would be, the last alert would also = .84 and you'd get alerted for something you didn't intend to or has already been fix. The way around that would be to run a sub-search for the previous 5min span, and if the threshold is decreasing, don't alert.
Hope this mess helps you out a little bit or at least gets you going down the right path!
A "Realtime" alert isn't technically possible in 4.1, but in 4.2 I've been told that this kind of thing should be possible. However, if once a minute is close enough to "realtime" then splunk can do this.
You may also want to take a look at the AlertThrottle app, it may work for your only-once requirement. If not, you can always take their idea and customize it to your own needs using a custom search command to save off state info about what alerts have been send and which ones are new.
I wouldn't be surprised if you could come up with some kind of solution that involves a few
inputcsv search commands to store off state info, but a simple python search command may end up being much more straight forward. I mean, let's face it, there was some custom code involved in making this alert work in it's previous implementation too, right? But I'm guessing that there will be significantly less with a splunk-based solution.