I'm trying to use the streamstats-command with time_window to track when certain user actions happen more than twice in a span of an hour. My search is like this ("dedup _time" because we get duplicate rows)
<search>
| sort _time
| fields _time user
| dedup _time
| streamstats time_window=60min count as amount by user
| where amount > 2
| table _time user amount
This search works correctly otherwise, but the problem is that it triggers multiple times when the user action happens more than twice in an hour. So I get results like:
2022-01-12 16:04:56.482 username1 3
2022-01-12 16:07:58.525 username1 4
2022-01-12 16:13:16.137 username1 5
2022-01-12 16:14:30.255 username1 6
So how can I get only the largest result (in this case 6) in a sequence like this? I can't use "dedup user" because the alert may trigger for the same user at some other time as well, and that should be reported as its own case.
Any help is greatly appreciated.
It's a tricky one and I can't find a reasonable solution with streamstats.
But you can do it another way.
<your search>
| transaction maxspan=1h maxevents=-1
| table _time eventcount
I must admit though that it's an ugly solution (and transaction command has its limits).
And you might want to resort your events from earliest to lates instead of the default latest to earliest order.
I'd be cautions with dedup (interesting - that's another topic today with dedup :-)). If you only do dedup on _time you might lose events for different users at the same time (applicability of course depends on the resolution of your _time). If you have simply duplicated events, you should rather use "dedup _time user".
And about the streamstats - well, it just works this way. You're asking for the count over a sliding window so it calculates it for each event over range-size window "backwards" in time.
And it's not obvious how those logins should be counted if - for example - a user logs in every 10 minutes over a span of 5 hours. What result would you expect?
"a user logs in every 10 minutes over a span of 5 hours. What result would you expect?"
That's a good point. Thinking about it, I would perhaps want 5 alerts of six events each - like binning by 60min, but only if the alert triggers. So, if the alert triggers, create a bin from the time of the first event in the sequence to +60 min. After that start running the streamstats again. I don't know if this is possible in Splunk though.
And thank you for pointing out the problem with dedup.
It's a tricky one and I can't find a reasonable solution with streamstats.
But you can do it another way.
<your search>
| transaction maxspan=1h maxevents=-1
| table _time eventcount
I must admit though that it's an ugly solution (and transaction command has its limits).
And you might want to resort your events from earliest to lates instead of the default latest to earliest order.
This solution works, thank you for your help!
We only have a couple thousand of these events in the index in total, so memory usage should not be a problem.