Splunk Search

Does fields break streamstats?

rmmiller
Contributor

I have a query using streamstats that is on the intensive side because I'm not dealing with nicely-formatted data. (Legacy code FTW)
To help with performance, I added the fields command to extract only the fields for the query to function, but I'm not getting the results I expect anymore.

This query adds the prev_field_of_interest field as expected.

eventtype=my_eventtype
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar

If I add the fields command, though, I no longer get any prev_field_of_interest fields added to my results, suggesting it somehow broke streamstats.

eventtype=my_eventtype
| fields source,sourcetype,index,host,foo,bar,field_of_interest
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar

All I get are the exact fields I asked for (and calculated via eval). 😕

  1. I thought maybe it was Fast vs. Smart mode. No difference.
  2. I thought maybe it was the BY clause on an evaluated field, so as a test I changed it to one of the extracted fields. No difference.
  3. I tried with and without source,sourcetype,index, and host, but their presence made no difference.
  4. Just for giggles, I tried adding prev_field_of_interest to the fields command, thinking maybe fields was processing stuff farther on down the pipe as well. No difference. 😞

Am I doing something wrong or does fields break streamstats?
It doesn't make sense to me since fields is a distributable streaming command, whereas streamstats is centralized streaming...

Splunk version is 7.0.2

Thanks!
rmmiller

0 Karma
1 Solution

to4kawa
Ultra Champion
| makeresults count=100
| eval foo=substr("abcde",random() % 4 , 1)
| eval bar=substr("abcde",random() % 4 , 1)
| eval field_of_interest=substr("abcdefghijklmnopqrstuvwxyz",random() % 26 , 4)
| fields foo,bar,field_of_interest
| fields - _time
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar

There seems to be no problem using the transform command.
Search Manual - Write better searches
Perhaps the fields command behaves unexpectedly when the streaming command continues.

 eventtype=my_eventtype
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar
| fields source, sourcetype, index, host, foo, bar, foobar, prev_field_of_interest

Are you not satisfied with the speed here?
If you have a little more information to talk about query optimization, we can help.

View solution in original post

0 Karma

rmmiller
Contributor

Thank you for the help, to4kawa and woodcock.

I revisited all of the queries I had been working with when I got this result, and this does not appear to be an issue with Splunk at all.

Instead, it's the dreaded PEBCAK problem. I realized I was working with too short of a time range, so my search results did not contain more than 1 event per value of foobar (my BY clause). Obviously streamstats can't figure out a change in field_of_interest without at least another event with the same value of foobar to compare against. Therefore, I wasn't getting prev_field_of_interest added to any events.

Another lightbulb moment was that since I'm not computing any statistics (in the truest sense of the word) with streamstats and am just looking for a change in field_of_interest, the window argument has zero bearing on the result and can be dropped.

Thanks,
rmmiller

woodcock
Esteemed Legend

I agree that this is a bug so the right thing to do is to open a support case so that it will be fixed. You can probably pin it to the search optimization code by disabling that feature as described here:
https://docs.splunk.com/Documentation/Splunk/latest/Search/Built-inoptimization#Turn_off_optimizatio...

0 Karma

rmmiller
Contributor

Thanks, woodcock. I'll give it a try on Monday when I return to the office and reply back with my results.
rmmiller

0 Karma

rmmiller
Contributor

OK, frustrated.. I went back to the exact query that I thought I was running last week, and streamstats is working fine in conjunction with fields. I will keep trying to find the query that was giving me unexpected behavior and come back to this thread to confirm.

Thanks,
rmmiller

0 Karma

to4kawa
Ultra Champion
| makeresults count=100
| eval foo=substr("abcde",random() % 4 , 1)
| eval bar=substr("abcde",random() % 4 , 1)
| eval field_of_interest=substr("abcdefghijklmnopqrstuvwxyz",random() % 26 , 4)
| fields foo,bar,field_of_interest
| fields - _time
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar

There seems to be no problem using the transform command.
Search Manual - Write better searches
Perhaps the fields command behaves unexpectedly when the streaming command continues.

 eventtype=my_eventtype
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar
| fields source, sourcetype, index, host, foo, bar, foobar, prev_field_of_interest

Are you not satisfied with the speed here?
If you have a little more information to talk about query optimization, we can help.

0 Karma

rmmiller
Contributor

Accepting this answer. There is no apparent conflict between fields and streamstats after revisiting my queries. Thanks!

0 Karma

rmmiller
Contributor

Hi to4kawa, and thanks for replying.

Both foo and bar are present in the search result, as they both appear in the arguments to the fields command.

| fields source,sourcetype,index,host,foo,bar,field_of_interest

Thanks,
rmmiller

0 Karma

to4kawa
Ultra Champion

I don't have _time so I think it's a different result.

0 Karma

rmmiller
Contributor

If I add this to the end of my search, the _time column is populated correctly:
| table _time,next_acd_cooker_binary,acd_cooker_binary,acd_custfilecombo

I don't think fields will drop internal fields like _time.

0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...