So we want to create an alert that will run every 5 minutes, check the results returned by a query and if the results for the current calendar day exceed a specific limit it will cause the alert to trigger.
Once the alert triggers, we want to throttle the alert for the remaining of the day.
Is that possible?
Keep in mind that a rolling-window approach is not suitable for this problem.
Hello,
Throttling won't work in your case as you are only considering the end of day not 24 hour interval.
So it should be controlled in search itself.
sourcetype=log4net source=Accounts "Account notification failed for " | eval date=strftime(_time, "%y-%m-%d") | stats dc(AccountId) as TotalNotificationsFailed by date|join source[|search sourcetype=log4net source=Accounts "Account notification failed for "|sort + _time |head 10|stats max(_time) as LatestEvtTime]|eval t=now()-300|where TotalNotificationsFailed > 10 AND t < LatestEvtTime
Thanks
Hello,
Throttling won't work in your case as you are only considering the end of day not 24 hour interval.
So it should be controlled in search itself.
sourcetype=log4net source=Accounts "Account notification failed for " | eval date=strftime(_time, "%y-%m-%d") | stats dc(AccountId) as TotalNotificationsFailed by date|join source[|search sourcetype=log4net source=Accounts "Account notification failed for "|sort + _time |head 10|stats max(_time) as LatestEvtTime]|eval t=now()-300|where TotalNotificationsFailed > 10 AND t < LatestEvtTime
Thanks
yep, makes sense. thanks
as your alert is trying to check every 5 minutes, i am only checking if the 10th error message time in falls under that 5 mins or not hence it is 300 seconds less than the current time. and if doesn't , don't trigger the alert again you need not set any other condition.
What does "t=now()-300" aim to achieve?
My understanding is that you are trying to figure out if the last trigger time is within the current day and if not trigger again?
Sure, consider the following:
sourcetype=log4net source=Accounts "Account notification failed for " | eval date=strftime(_time, "%y-%m-%d") | stats dc(AccountId) as TotalNotificationsFailed by date
Basically, if you got 10 failed notifications in a given calendar day the alert should fire off and then get suppressed for the remaining day. Currently it fires off, cron job every 5 minutes kicks in, and re-triggers.
The timewindow is @d to 'now'
While configuring throttling only your would get a text box labelled "Suppress the result...". See "http://docs.splunk.com/Documentation/Splunk/5.0.5/Alert/Defineper-resultalerts".
Would it be possible for your to provide the search you're executing?
By suppress you mean to enable Throttling? I tried that, set it for ie 5 minutes by the Date field. That worked but after 5 minutes the alert was retriggered.
Try this. Add a field say 'Date' in your results which will hold current calendar day. And then in triggering configurations, select "once per result" and in throttling configuration provide this 'Date' field as "Suppress the result with the same field value". And since this will be per result alert, modify your search to return just one row (which I believe you're doing anyways).