Alerting

Throttling out repeated events

Communicator

I have a hypothetical search that runs every 5 minutes and scans last hour worth of data for certain errors:
index=log "error" earliest=-1h | eventstats c as user_errors by username | where user_errors>5 | dedup username | table ip, username, user_errors.
I want to alert on users who cause more than 5 errors per any hour period.
This search generates approx. 10 alerts each time, where 6-8 users are the same because "sliding" window is an hour wide and is being scanned every 5 minutes.
Ideally i'd want to throttle alerts for the same username.
So I want to receive alerts only on all "unique" users and never receive alerts for the same user more often than once an hour.

The only way I see how to do that is to set "Alerting mode"-"Once per result" and set "Per result throttling fields"=username

What happens after i did that - is that I receive email alert for only 1 user and it misses all the other users. And then i do not receive any more alerts whatsoever till the next hour - and again for one user only.

Any way to fix that?

Tags (2)
0 Karma
1 Solution

Esteemed Legend

You can do this with some variation of dynamic lookups:
http://wiki.splunk.com/Dynamically_Editing_Lookup_Tables

One approach is like this:
You have a lookup table that has input field username and output field last_alert_time.
Before you generate a alert (in the search), you do a lookup in last_alert_by_user_lookup.csv for username to get last_alert_time and only alert if _time - last_alert_time > threshold_seconds (or if last_alert_time is null).
Every time that you generate an alert, you call a scirpt to update last_alert_by_user_lookup.csv to "upsert" the user's last_alert_time (you could do this with another scheduled search and also other ways but this is probably easiest).

View solution in original post

Esteemed Legend

You can do this with some variation of dynamic lookups:
http://wiki.splunk.com/Dynamically_Editing_Lookup_Tables

One approach is like this:
You have a lookup table that has input field username and output field last_alert_time.
Before you generate a alert (in the search), you do a lookup in last_alert_by_user_lookup.csv for username to get last_alert_time and only alert if _time - last_alert_time > threshold_seconds (or if last_alert_time is null).
Every time that you generate an alert, you call a scirpt to update last_alert_by_user_lookup.csv to "upsert" the user's last_alert_time (you could do this with another scheduled search and also other ways but this is probably easiest).

View solution in original post

Communicator

Yes, I think this is good approach.
It's reasonably flexible and can be applied to more complex throttling logic as well.

Thank you.

0 Karma

Esteemed Legend

Please "accept" my answer if it works for you.

0 Karma

Communicator

No prob, thank you.

0 Karma

SplunkTrust
SplunkTrust

You could change your search to this:

index=log "error" earliest=-1h | stats c as user_errors latest(ip) as ip by username | where user_errors > 5

Should be a bit more efficient than a late dedup... unrelated to the actual question of course.

0 Karma

Communicator

Thank you.

0 Karma