Splunk Search

problem with streamstats command, using both window=N and a by clause.

sideview
SplunkTrust
SplunkTrust

This is in regards to using the streamstats command with a "by" clause, and at the same time specifying window=N to tell it to only compute the statistics using the N most recent rows.

The Splunk docs for streamstats say that the window will take into account the "by" field:

See here under "More examples"
http://docs.splunk.com/Documentation/Splunk/6.0.1/SearchReference/Streamstats

Specifically it says:

Example 1: Compute the average value of foo for each value of bar including only the only 5 
events with that value of bar.

... | streamstats avg(foo) by bar window=5 global=f

However this does not seem to be the case. When I use window=N with a by clause, the logic around window=N seems to ignore the by clause and it only looks at the 5 previous rows regardless of what value they had for the by clause. Of course depending on your sort order those rows may or may not have the same value for the "by" field as the current row, and when streamstats calcualted the statistics for those 5 rows, it does correctly discard rows whose by fields dont match.

The end result is confusion!

Does anyone know whether the docs are wrong or whether this is a bug in streamstats?

and can anyone think of a workaround? I need to basically have this process rows that have _time deviceName and a field called isBlank that is either 1 or zero.

| streamstats current=f window=24  sum(isBlank) as rollingBlankHourCount by deviceName
1 Solution

steveyz
Splunk Employee
Splunk Employee

The docs are correct. If global=f, the window is per "by" field value.

It seems to work properly for me. Do you get different behavior if you get global=t ?

View solution in original post

steveyz
Splunk Employee
Splunk Employee

The docs are correct. If global=f, the window is per "by" field value.

It seems to work properly for me. Do you get different behavior if you get global=t ?

sideview
SplunkTrust
SplunkTrust

Fair enough. Thanks for the explanation!

0 Karma

steveyz
Splunk Employee
Splunk Employee

I agree that when by is specified, wanting a global window is a minority use case. But I can think of cases where you might want something like the count by type for the last 10000 events. I mainly left the default as global=f because it's more efficient to have a global window so I'd rather force the user explicitly have to request a non-global window.

0 Karma

sideview
SplunkTrust
SplunkTrust

Understood. The way it works is just confusing and I made a reasonable assumption that turned out to be false. The global=t|f argument is only relevant if you're using a "by" clause with window=N, and if you're using a "by" clause, 99% of the time you'll want the global="f" behavior. So I made the reasonable but bad assumption that global="f" was the default. =/

Is there any case you can think of where you want to split things up with a "by" clause, but you want to keep the window operating in a global fashion? I can't think of why you'd ever want that.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...