Alerting

Throttling out repeated events

gesman
Communicator

I have a hypothetical search that runs every 5 minutes and scans last hour worth of data for certain errors:
index=log "error" earliest=-1h | eventstats c as user_errors by username | where user_errors>5 | dedup username | table ip, username, user_errors.
I want to alert on users who cause more than 5 errors per any hour period.
This search generates approx. 10 alerts each time, where 6-8 users are the same because "sliding" window is an hour wide and is being scanned every 5 minutes.
Ideally i'd want to throttle alerts for the same username.
So I want to receive alerts only on all "unique" users and never receive alerts for the same user more often than once an hour.

The only way I see how to do that is to set "Alerting mode"-"Once per result" and set "Per result throttling fields"=username

What happens after i did that - is that I receive email alert for only 1 user and it misses all the other users. And then i do not receive any more alerts whatsoever till the next hour - and again for one user only.

Any way to fix that?

Tags (2)
0 Karma
1 Solution

woodcock
Esteemed Legend

You can do this with some variation of dynamic lookups:
http://wiki.splunk.com/Dynamically_Editing_Lookup_Tables

One approach is like this:
You have a lookup table that has input field username and output field last_alert_time.
Before you generate a alert (in the search), you do a lookup in last_alert_by_user_lookup.csv for username to get last_alert_time and only alert if _time - last_alert_time > threshold_seconds (or if last_alert_time is null).
Every time that you generate an alert, you call a scirpt to update last_alert_by_user_lookup.csv to "upsert" the user's last_alert_time (you could do this with another scheduled search and also other ways but this is probably easiest).

View solution in original post

woodcock
Esteemed Legend

You can do this with some variation of dynamic lookups:
http://wiki.splunk.com/Dynamically_Editing_Lookup_Tables

One approach is like this:
You have a lookup table that has input field username and output field last_alert_time.
Before you generate a alert (in the search), you do a lookup in last_alert_by_user_lookup.csv for username to get last_alert_time and only alert if _time - last_alert_time > threshold_seconds (or if last_alert_time is null).
Every time that you generate an alert, you call a scirpt to update last_alert_by_user_lookup.csv to "upsert" the user's last_alert_time (you could do this with another scheduled search and also other ways but this is probably easiest).

gesman
Communicator

Yes, I think this is good approach.
It's reasonably flexible and can be applied to more complex throttling logic as well.

Thank you.

0 Karma

woodcock
Esteemed Legend

Please "accept" my answer if it works for you.

0 Karma

gesman
Communicator

No prob, thank you.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

You could change your search to this:

index=log "error" earliest=-1h | stats c as user_errors latest(ip) as ip by username | where user_errors > 5

Should be a bit more efficient than a late dedup... unrelated to the actual question of course.

0 Karma

gesman
Communicator

Thank you.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...