Splunk Search

Does fields break streamstats?

rmmiller
Contributor

I have a query using streamstats that is on the intensive side because I'm not dealing with nicely-formatted data. (Legacy code FTW)
To help with performance, I added the fields command to extract only the fields for the query to function, but I'm not getting the results I expect anymore.

This query adds the prev_field_of_interest field as expected.

eventtype=my_eventtype
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar

If I add the fields command, though, I no longer get any prev_field_of_interest fields added to my results, suggesting it somehow broke streamstats.

eventtype=my_eventtype
| fields source,sourcetype,index,host,foo,bar,field_of_interest
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar

All I get are the exact fields I asked for (and calculated via eval). 😕

  1. I thought maybe it was Fast vs. Smart mode. No difference.
  2. I thought maybe it was the BY clause on an evaluated field, so as a test I changed it to one of the extracted fields. No difference.
  3. I tried with and without source,sourcetype,index, and host, but their presence made no difference.
  4. Just for giggles, I tried adding prev_field_of_interest to the fields command, thinking maybe fields was processing stuff farther on down the pipe as well. No difference. 😞

Am I doing something wrong or does fields break streamstats?
It doesn't make sense to me since fields is a distributable streaming command, whereas streamstats is centralized streaming...

Splunk version is 7.0.2

Thanks!
rmmiller

0 Karma
1 Solution

to4kawa
Ultra Champion
| makeresults count=100
| eval foo=substr("abcde",random() % 4 , 1)
| eval bar=substr("abcde",random() % 4 , 1)
| eval field_of_interest=substr("abcdefghijklmnopqrstuvwxyz",random() % 26 , 4)
| fields foo,bar,field_of_interest
| fields - _time
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar

There seems to be no problem using the transform command.
Search Manual - Write better searches
Perhaps the fields command behaves unexpectedly when the streaming command continues.

 eventtype=my_eventtype
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar
| fields source, sourcetype, index, host, foo, bar, foobar, prev_field_of_interest

Are you not satisfied with the speed here?
If you have a little more information to talk about query optimization, we can help.

View solution in original post

0 Karma

rmmiller
Contributor

Thank you for the help, to4kawa and woodcock.

I revisited all of the queries I had been working with when I got this result, and this does not appear to be an issue with Splunk at all.

Instead, it's the dreaded PEBCAK problem. I realized I was working with too short of a time range, so my search results did not contain more than 1 event per value of foobar (my BY clause). Obviously streamstats can't figure out a change in field_of_interest without at least another event with the same value of foobar to compare against. Therefore, I wasn't getting prev_field_of_interest added to any events.

Another lightbulb moment was that since I'm not computing any statistics (in the truest sense of the word) with streamstats and am just looking for a change in field_of_interest, the window argument has zero bearing on the result and can be dropped.

Thanks,
rmmiller

woodcock
Esteemed Legend

I agree that this is a bug so the right thing to do is to open a support case so that it will be fixed. You can probably pin it to the search optimization code by disabling that feature as described here:
https://docs.splunk.com/Documentation/Splunk/latest/Search/Built-inoptimization#Turn_off_optimizatio...

0 Karma

rmmiller
Contributor

Thanks, woodcock. I'll give it a try on Monday when I return to the office and reply back with my results.
rmmiller

0 Karma

rmmiller
Contributor

OK, frustrated.. I went back to the exact query that I thought I was running last week, and streamstats is working fine in conjunction with fields. I will keep trying to find the query that was giving me unexpected behavior and come back to this thread to confirm.

Thanks,
rmmiller

0 Karma

to4kawa
Ultra Champion
| makeresults count=100
| eval foo=substr("abcde",random() % 4 , 1)
| eval bar=substr("abcde",random() % 4 , 1)
| eval field_of_interest=substr("abcdefghijklmnopqrstuvwxyz",random() % 26 , 4)
| fields foo,bar,field_of_interest
| fields - _time
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar

There seems to be no problem using the transform command.
Search Manual - Write better searches
Perhaps the fields command behaves unexpectedly when the streaming command continues.

 eventtype=my_eventtype
| eval foo=upper(foo)
| eval bar=upper(bar)
| eval foobar=foo+" "+bar
| streamstats current=false window=5 global=false last(field_of_interest) as prev_field_of_interest by foobar
| fields source, sourcetype, index, host, foo, bar, foobar, prev_field_of_interest

Are you not satisfied with the speed here?
If you have a little more information to talk about query optimization, we can help.

0 Karma

rmmiller
Contributor

Accepting this answer. There is no apparent conflict between fields and streamstats after revisiting my queries. Thanks!

0 Karma

rmmiller
Contributor

Hi to4kawa, and thanks for replying.

Both foo and bar are present in the search result, as they both appear in the arguments to the fields command.

| fields source,sourcetype,index,host,foo,bar,field_of_interest

Thanks,
rmmiller

0 Karma

to4kawa
Ultra Champion

I don't have _time so I think it's a different result.

0 Karma

rmmiller
Contributor

If I add this to the end of my search, the _time column is populated correctly:
| table _time,next_acd_cooker_binary,acd_cooker_binary,acd_custfilecombo

I don't think fields will drop internal fields like _time.

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...