Archive

## Cumulative sum of a distinct count over time.

Contributor

I've got a variable, call it "flowers," related to orders from a shop. I'm trying to get a chart of the number of unique flower types purchased over a time range, as well as the cumulative total.

I've found another answer here which gave the hint of:

searchterm | stats values(flowers) as fcount dc(flowers) by date_minute | streamstats dc(fcount) as "Cumulative total"

And whilst that works, it breaks down the number of unique flowers per a given minute, across the date range -- all fine if that range is within a given hour, but, otherwise it's not quite the right way to break this down.

What I'm after though is a way to look at 36 hours worth of flower sales, and for each hour, or minute, see the number of unique flower types sold, but also have the cumulative total.

As I interpret the above example, it's recycling "minutes" (or hours if I use date_hour) so that the 12th minute across two days is only represented by a single value..

Missing something obvious here.....

Tags (1)
Splunk Employee
``````...
| sort _time
| streamstats global=f window=0 current=t
values(flowers) as v_flowers
| eval v_flowers=mvjoin(v_flowers,";")
| timechart span=1h dc(flowers) as cur_dc, last(v_flowers) as cum_dc
| eval cum_dc=mvcount(split(cum_dc,";"))
``````

The main tricks are (a) you need to sort and get the cumulative count first, and (b) convert the list of items from a multivalue field since it seems that the `timechart`'s `last()` function doesn't preserve multivalues.

Contributor

I think I follow the logic here, will have to experiment. I ran into a 10000 row limit on this one though. I did find a "hack" that seems to do it:

...| bin _time span=6h as hour | stats values(flowers) as cflowers dc(flowers) as "Flowers per hour" by hour | streamstats dc(cflowers) as "Cumulative total" | eval Time=strftime(hour, "%m-%d.%H") | table Time, "Flowers per hour", "Cumulative total"

Splunk Employee

I think you may want this :

``````... earliest=-36h | timechart span=1h dc(flower) AS "# of flower types purchased" | streamstats sum("# of flower types purchased") AS total
``````

I'm not sure how relevant the cumulative total will be though, as it is only representative of the sum of distinct counts over time, which obviously will have duplicate flower types.

You might find the results of this search more interesting as they will expose similar information :

``````... earliest=-36h | timechart span=1h count by flower | addtotals | appendcols [search ... | timechart span=1h dc(flower) AS "# of flower types purchased"]
``````

This will provide the number of purchases per flower type, the total number of purchases, and the distinct count of flower types purchased for every hour.

For more detais and examples, check the Search Reference manual records for timechart, appendcols and addtotals.

Splunk Employee

@howyagoin : Oh ok, you want a distinct count over the entire period, not an on-going cumulative sum of the item sales for each cycle.

Splunk Employee

Try what I came up and posted now.

Splunk Employee

hmm. maybe not. `streamstats` doesn't work with `sistats` as I had hoped.

Splunk Employee

This is a bit tricky, but possible. See my answer.

Splunk Employee

`... earliest=-36h | timechart span=1h count, dc(flower) AS "Distinct count of flower types purchased" | streamstats sum(count) AS "Total flowers purchased" | fields - count`