Solved: Re: Throttling out repeated events

gesman · ‎05-09-2015

I have a hypothetical search that runs every 5 minutes and scans last hour worth of data for certain errors:
index=log "error" earliest=-1h | eventstats c as user_errors by username | where user_errors>5 | dedup username | table ip, username, user_errors.
I want to alert on users who cause more than 5 errors per any hour period.
This search generates approx. 10 alerts each time, where 6-8 users are the same because "sliding" window is an hour wide and is being scanned every 5 minutes.
Ideally i'd want to throttle alerts for the same username.
So I want to receive alerts only on all "unique" users and never receive alerts for the same user more often than once an hour.

The only way I see how to do that is to set "Alerting mode"-"Once per result" and set "Per result throttling fields"=username

What happens after i did that - is that I receive email alert for only 1 user and it misses all the other users. And then i do not receive any more alerts whatsoever till the next hour - and again for one user only.

Any way to fix that?

woodcock · ‎05-10-2015

You can do this with some variation of dynamic lookups:
http://wiki.splunk.com/Dynamically_Editing_Lookup_Tables

One approach is like this:
You have a lookup table that has input field username and output field last_alert_time.
Before you generate a alert (in the search), you do a lookup in last_alert_by_user_lookup.csv for username to get last_alert_time and only alert if _time - last_alert_time > threshold_seconds (or if last_alert_time is null).
Every time that you generate an alert, you call a scirpt to update last_alert_by_user_lookup.csv to "upsert" the user's last_alert_time (you could do this with another scheduled search and also other ways but this is probably easiest).

View solution in original post

woodcock · ‎05-10-2015

You can do this with some variation of dynamic lookups:
http://wiki.splunk.com/Dynamically_Editing_Lookup_Tables

One approach is like this:
You have a lookup table that has input field username and output field last_alert_time.
Before you generate a alert (in the search), you do a lookup in last_alert_by_user_lookup.csv for username to get last_alert_time and only alert if _time - last_alert_time > threshold_seconds (or if last_alert_time is null).
Every time that you generate an alert, you call a scirpt to update last_alert_by_user_lookup.csv to "upsert" the user's last_alert_time (you could do this with another scheduled search and also other ways but this is probably easiest).

gesman · ‎05-10-2015

Yes, I think this is good approach.
It's reasonably flexible and can be applied to more complex throttling logic as well.

Thank you.

woodcock · ‎05-10-2015

Please "accept" my answer if it works for you.

gesman · ‎05-10-2015

No prob, thank you.

martin_mueller · ‎05-09-2015

You could change your search to this:

index=log "error" earliest=-1h | stats c as user_errors latest(ip) as ip by username | where user_errors &gt; 5

Should be a bit more efficient than a late dedup... unrelated to the actual question of course.

gesman · ‎05-09-2015

Thank you.

Throttling out repeated events

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

Transform your security operations with Splunk Enterprise Security

Splunk Admins and App Developers | Earn a $35 gift card!