This is in regards to using the streamstats command with a "by" clause, and at the same time specifying window=N to tell it to only compute the statistics using the N most recent rows.
The Splunk docs for streamstats say that the window will take into account the "by" field:
See here under "More examples"
http://docs.splunk.com/Documentation/Splunk/6.0.1/SearchReference/Streamstats
Specifically it says:
Example 1: Compute the average value of foo for each value of bar including only the only 5
events with that value of bar.
... | streamstats avg(foo) by bar window=5 global=f
However this does not seem to be the case. When I use window=N with a by clause, the logic around window=N seems to ignore the by clause and it only looks at the 5 previous rows regardless of what value they had for the by clause. Of course depending on your sort order those rows may or may not have the same value for the "by" field as the current row, and when streamstats calcualted the statistics for those 5 rows, it does correctly discard rows whose by fields dont match.
The end result is confusion!
Does anyone know whether the docs are wrong or whether this is a bug in streamstats?
and can anyone think of a workaround? I need to basically have this process rows that have _time deviceName and a field called isBlank
that is either 1 or zero.
| streamstats current=f window=24 sum(isBlank) as rollingBlankHourCount by deviceName
The docs are correct. If global=f, the window is per "by" field value.
It seems to work properly for me. Do you get different behavior if you get global=t ?
The docs are correct. If global=f, the window is per "by" field value.
It seems to work properly for me. Do you get different behavior if you get global=t ?
Fair enough. Thanks for the explanation!
I agree that when by is specified, wanting a global window is a minority use case. But I can think of cases where you might want something like the count by type for the last 10000 events. I mainly left the default as global=f because it's more efficient to have a global window so I'd rather force the user explicitly have to request a non-global window.
Understood. The way it works is just confusing and I made a reasonable assumption that turned out to be false. The global=t|f argument is only relevant if you're using a "by" clause with window=N, and if you're using a "by" clause, 99% of the time you'll want the global="f" behavior. So I made the reasonable but bad assumption that global="f" was the default. =/
Is there any case you can think of where you want to split things up with a "by" clause, but you want to keep the window operating in a global fashion? I can't think of why you'd ever want that.