Splunk Search

Get a distinct count of new values in a field per day with respect to all previous days in the time range

boo
Engager

I want a distinct count for a given field by day, but this count also needs to look at all previous days in the given time range to see the remaining distinct count?

For example:

When I search  for all events that are on Monday the result is : a,b,c,d,c,z
When I search for all events that are on Tuesday the result is : a,b,c,d,e,f,a,b
When I search for all events that are on Wednesday the result is : a,b,c,d,e,f,z

The distinct count for Monday is 5 and for Tuesday is 6 and for Wednesday it is 7. The  remaining distinct count for Tuesday would be 2, since a,b,c,d have all already appeared on Monday and the remaining distinct count for Wednesday would be 0 since all values have appeared on both Monday and Tuesday already.

How would I be able to do this?

Thanks

Labels (2)
0 Karma
1 Solution

boo
Engager

Thanks for the reply! No the count doesn't reset. 

I made a search that does what I need but it is quite slow:

index=_internal  earliest=-3d@d latest=-d@d
|reverse
|dedup sourcetype
|bucket _time span=1d
|stats dc(sourcetype) by _time

 The above search would make use of the fact that dedup goes by the first event that it sees, so if i were to reverse first then the first event i would see is the last event.

Cheers,

View solution in original post

Tags (1)

bowesmana
SplunkTrust
SplunkTrust

As an example of collection today's values not seen previously you can do the following

 

index=_internal  earliest=-3d@d latest=-d@d
  | stats count by sourcetype
| append [
  search index=_internal earliest=@d latest=now
  | stats count by sourcetype
]
| stats count by sourcetype
| where count=1

 

but I guess this is not quite what you're after.

I suspect you will need to look at streamstats

 

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Does the count ever reset, so the following week, if you get the same results, is the remaining DC always then 0, as you are looking back to include this week to determine what is 'new'

 

0 Karma

boo
Engager

Thanks for the reply! No the count doesn't reset. 

I made a search that does what I need but it is quite slow:

index=_internal  earliest=-3d@d latest=-d@d
|reverse
|dedup sourcetype
|bucket _time span=1d
|stats dc(sourcetype) by _time

 The above search would make use of the fact that dedup goes by the first event that it sees, so if i were to reverse first then the first event i would see is the last event.

Cheers,

Tags (1)

bowesmana
SplunkTrust
SplunkTrust

Hey cool, I worked out how to make it faster.

Rather than reverse all the data, first bucket and collect the values of the field, then expand that to rows and sort by ascending time. The data volume is likely to be much smaller and it achieves the same thing based on your dedup principle - nice usage @boo 

 

index=_internal  earliest=-7d@d latest=-d@d
|bucket _time span=1d
|stats values(sourcetype) as sourcetype dc(sourcetype)  by _time
| mvexpand sourcetype
| sort _time
| dedup sourcetype
|stats values(sourcetype) as sourcetype dc(sourcetype)  by _time

 

 I included the values(sourcetype) in the second stats, so I could see it working. In my test it reduced the time over the last 7 days from 50 second for 5.5 million rows to 15 seconds.

 

Get Updates on the Splunk Community!

New This Month - Splunk Observability updates and improvements for faster ...

What’s New? This month, we’re delivering several enhancements across Splunk Observability Cloud for faster and ...

What's New in Splunk Cloud Platform 9.3.2411?

Hey Splunky People! We are excited to share the latest updates in Splunk Cloud Platform 9.3.2411. This release ...

Buttercup Games: Further Dashboarding Techniques (Part 6)

This series of blogs assumes you have already completed the Splunk Enterprise Search Tutorial as it uses the ...