I want a distinct count for a given field by day, but this count also needs to look at all previous days in the given time range to see the remaining distinct count?
For example:
When I search for all events that are on Monday the result is : a,b,c,d,c,z
When I search for all events that are on Tuesday the result is : a,b,c,d,e,f,a,b
When I search for all events that are on Wednesday the result is : a,b,c,d,e,f,z
The distinct count for Monday is 5 and for Tuesday is 6 and for Wednesday it is 7. The remaining distinct count for Tuesday would be 2, since a,b,c,d have all already appeared on Monday and the remaining distinct count for Wednesday would be 0 since all values have appeared on both Monday and Tuesday already.
How would I be able to do this?
Thanks
Thanks for the reply! No the count doesn't reset.
I made a search that does what I need but it is quite slow:
index=_internal earliest=-3d@d latest=-d@d
|reverse
|dedup sourcetype
|bucket _time span=1d
|stats dc(sourcetype) by _time
The above search would make use of the fact that dedup goes by the first event that it sees, so if i were to reverse first then the first event i would see is the last event.
Cheers,
As an example of collection today's values not seen previously you can do the following
index=_internal earliest=-3d@d latest=-d@d
| stats count by sourcetype
| append [
search index=_internal earliest=@d latest=now
| stats count by sourcetype
]
| stats count by sourcetype
| where count=1
but I guess this is not quite what you're after.
I suspect you will need to look at streamstats
Does the count ever reset, so the following week, if you get the same results, is the remaining DC always then 0, as you are looking back to include this week to determine what is 'new'
Thanks for the reply! No the count doesn't reset.
I made a search that does what I need but it is quite slow:
index=_internal earliest=-3d@d latest=-d@d
|reverse
|dedup sourcetype
|bucket _time span=1d
|stats dc(sourcetype) by _time
The above search would make use of the fact that dedup goes by the first event that it sees, so if i were to reverse first then the first event i would see is the last event.
Cheers,
Hey cool, I worked out how to make it faster.
Rather than reverse all the data, first bucket and collect the values of the field, then expand that to rows and sort by ascending time. The data volume is likely to be much smaller and it achieves the same thing based on your dedup principle - nice usage @boo
index=_internal earliest=-7d@d latest=-d@d
|bucket _time span=1d
|stats values(sourcetype) as sourcetype dc(sourcetype) by _time
| mvexpand sourcetype
| sort _time
| dedup sourcetype
|stats values(sourcetype) as sourcetype dc(sourcetype) by _time
I included the values(sourcetype) in the second stats, so I could see it working. In my test it reduced the time over the last 7 days from 50 second for 5.5 million rows to 15 seconds.