Splunk Search

streamstats is reversed?

emiller42
Motivator

I'm trying to calculate volume growth by comparing the values of subsequent events from the df sourcetype. To get the current and previous values, I'm using eventstats like so:

index=os sourcetype=df host="HOST_NAME" | multikv | search MountedOn="VOLUME" | convert auto(UsePct) | streamstats current=f window=1 first(UsePct) as prevUsePct | table _time MountedOn prevUsePct UsePct

When I do this, the 'prevUsePct' value appears to be the UsePct value from the next record, not the previous one. So my output looks like this:

_time                           MountedOn       prevUsePct  UsePct  
2013-10-10T09:10:14.000-0500    /data/vol_253   58          58
2013-10-10T09:05:14.000-0500    /data/vol_253   58          58
2013-10-10T09:00:14.000-0500    /data/vol_253   58          57
2013-10-10T08:55:14.000-0500    /data/vol_253   57          57
2013-10-10T08:50:15.000-0500    /data/vol_253   57          57     

While I would expect to see something like this:


_time MountedOn prevUsePct UsePct

2013-10-10T09:10:14.000-0500 /data/vol_253 58 58
2013-10-10T09:05:14.000-0500 /data/vol_253 57 58
2013-10-10T09:00:14.000-0500 /data/vol_253 57 57
2013-10-10T08:55:14.000-0500 /data/vol_253 57 57
2013-10-10T08:50:15.000-0500 /data/vol_253 57 57

Hopefully this illustrates my concern. It appears that streamstats starts with the current record and looks ahead, not back as the documentation indicates.

Is this a bug, or am I misunderstanding the command?

Tags (1)
1 Solution

cramasta
Builder

Just sort your data the opposite way before running stream stats.

index=os sourcetype=df host="HOST_NAME" | multikv | search MountedOn="VOLUME" | convert auto(UsePct) | sort _time | streamstats current=f window=1 first(UsePct) as prevUsePct | table _time MountedOn prevUsePct UsePct

You might even be able to do

index=os sourcetype=df host="HOST_NAME" | multikv | search MountedOn="VOLUME" | convert auto(UsePct)| reverse | streamstats current=f window=1 first(UsePct) as prevUsePct | table _time MountedOn prevUsePct UsePct

View solution in original post

gvnd
Path Finder

I downvoted this post because ardhyurszehz tfjuxt

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

the first() function means the first one that is encountered, not the first one in time. the function you really want is earliest(). but there's more, you can't just swap it out. since Splunk returns events in reverse time order, you're of course seeing the opposite of what you want. I don't recommend using reverse since it could mean you have to re-sort a very large data set before doing anything else. Instead, you should just include your current event, increase the window size to actually include the "next" (i.e., earlier) event, and reference that with the earliest() function:

... | streamstats current=t window=2 earliest(UsePct) as prevUsePct | ...

gkanapathy
Splunk Employee
Splunk Employee

ah, i see. yes, the window on streamstats is backwards. it is a "trailing" window, which means it covers the current events and events seen "before", i.e., events that are later in time. so with current=t, last() will always refer to the current event. unfortunately (and this is a common use case), this means that what you want to do needs to be done with reverse, or else with something like:

... | streamstats current=t window=2 latest(_time) as time_new latest(MountedOn) as MountedOn_new latest(UsePct) as UsePct_new earliest(UsePct) as prevUsePct | ...

except that kind of sucks.

emiller42
Motivator

Right, I'm aware of that. It's just something that's not intuitive to people when they first encounter it. I'm wondering if streamstats is using the same 'newest to oldest' logic. Also, the suggestion is something I've already tried, and it still behaves as seen above. If you include current, it gives you the current value of the event. If you don't include current, you get the value for the next event.

0 Karma

cramasta
Builder

Just sort your data the opposite way before running stream stats.

index=os sourcetype=df host="HOST_NAME" | multikv | search MountedOn="VOLUME" | convert auto(UsePct) | sort _time | streamstats current=f window=1 first(UsePct) as prevUsePct | table _time MountedOn prevUsePct UsePct

You might even be able to do

index=os sourcetype=df host="HOST_NAME" | multikv | search MountedOn="VOLUME" | convert auto(UsePct)| reverse | streamstats current=f window=1 first(UsePct) as prevUsePct | table _time MountedOn prevUsePct UsePct

emiller42
Motivator

| reverse | does work! Although seems like the command is working differently than described. (It could be similar to the stats functions like first() last() which also operate counterintuitively)

0 Karma
Get Updates on the Splunk Community!

Video | Welcome Back to Smartness, Pedro

Remember Splunk Community member, Pedro Borges? If you tuned into Episode 2 of our Smartness interview series, ...

Detector Best Practices: Static Thresholds

Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within ...

Expert Tips from Splunk Education, Observability in Action, Plus More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...