I have a query using streamstats that is on the intensive side because I'm not dealing with nicely-formatted data. (Legacy code FTW)
To help with performance, I added the fields command to extract only the fields for the query to function, but I'm not getting the results I expect anymore.
This query adds the prev_field_of_interest field as expected.
eventtype=my_eventtype
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar
If I add the fields command, though, I no longer get any prev_field_of_interest fields added to my results, suggesting it somehow broke streamstats.
eventtype=my_eventtype
| fields source,sourcetype,index,host,foo,bar,field_of_interest
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar
All I get are the exact fields I asked for (and calculated via eval). 😕
Am I doing something wrong or does fields break streamstats?
It doesn't make sense to me since fields is a distributable streaming command, whereas streamstats is centralized streaming...
Splunk version is 7.0.2
Thanks!
rmmiller
| makeresults count=100
| eval foo=substr("abcde",random() % 4 , 1)
| eval bar=substr("abcde",random() % 4 , 1)
| eval field_of_interest=substr("abcdefghijklmnopqrstuvwxyz",random() % 26 , 4)
| fields foo,bar,field_of_interest
| fields - _time
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar
There seems to be no problem using the transform command.
Search Manual - Write better searches
Perhaps the fields
command behaves unexpectedly when the streaming command continues.
eventtype=my_eventtype
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar
| fields source, sourcetype, index, host, foo, bar, foobar, prev_field_of_interest
Are you not satisfied with the speed here?
If you have a little more information to talk about query optimization, we can help.
Thank you for the help, to4kawa and woodcock.
I revisited all of the queries I had been working with when I got this result, and this does not appear to be an issue with Splunk at all.
Instead, it's the dreaded PEBCAK problem. I realized I was working with too short of a time range, so my search results did not contain more than 1 event per value of foobar (my BY clause). Obviously streamstats can't figure out a change in field_of_interest without at least another event with the same value of foobar to compare against. Therefore, I wasn't getting prev_field_of_interest added to any events.
Another lightbulb moment was that since I'm not computing any statistics (in the truest sense of the word) with streamstats and am just looking for a change in field_of_interest, the window argument has zero bearing on the result and can be dropped.
Thanks,
rmmiller
I agree that this is a bug so the right thing to do is to open a support case so that it will be fixed. You can probably pin it to the search optimization
code by disabling that feature as described here:
https://docs.splunk.com/Documentation/Splunk/latest/Search/Built-inoptimization#Turn_off_optimizatio...
Thanks, woodcock. I'll give it a try on Monday when I return to the office and reply back with my results.
rmmiller
OK, frustrated.. I went back to the exact query that I thought I was running last week, and streamstats is working fine in conjunction with fields. I will keep trying to find the query that was giving me unexpected behavior and come back to this thread to confirm.
Thanks,
rmmiller
| makeresults count=100
| eval foo=substr("abcde",random() % 4 , 1)
| eval bar=substr("abcde",random() % 4 , 1)
| eval field_of_interest=substr("abcdefghijklmnopqrstuvwxyz",random() % 26 , 4)
| fields foo,bar,field_of_interest
| fields - _time
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar
There seems to be no problem using the transform command.
Search Manual - Write better searches
Perhaps the fields
command behaves unexpectedly when the streaming command continues.
eventtype=my_eventtype
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar
| fields source, sourcetype, index, host, foo, bar, foobar, prev_field_of_interest
Are you not satisfied with the speed here?
If you have a little more information to talk about query optimization, we can help.
Accepting this answer. There is no apparent conflict between fields and streamstats after revisiting my queries. Thanks!
Hi to4kawa, and thanks for replying.
Both foo and bar are present in the search result, as they both appear in the arguments to the fields command.
| fields source,sourcetype,index,host,foo,bar,field_of_interest
Thanks,
rmmiller
I don't have _time so I think it's a different result.
If I add this to the end of my search, the _time column is populated correctly:
| table _time,next_acd_cooker_binary,acd_cooker_binary,acd_custfilecombo
I don't think fields will drop internal fields like _time.