Splunk Search
Highlighted

Does distinct_count() in a timechart count unique instances per time slice or per chart?

Explorer

Hi,

I was reading Example 3 in this tutorial - to do with distinct_count().

I would like to know when you apply distinct_count() to a timechart, if it is counting something as distinct for a single time slice (i.e. counting it again in the next time slice) or if it is counting something as distinct across the entire chart.

So, applied to Example 3, I think it would be

sourcetype=access_* action=purchase category_id=flowers | timechart dc(clientip)

Which I would expect to generate a timechart with the count of distinct/unique clientip over time. I.e. count a user when they first purchased flowers, and never count them again.

Is this what is happening, or does it count the user once in the first month, and then count them once again in the second month (assuming time slices are in months)?

The reason I am asking is that I want a time chart of the number of new users over time, so I do not want to count the same user ever again.

Many thanks.

Highlighted

Re: Does distinct_count() in a timechart count unique instances per time slice or per chart?

Splunk Employee
Splunk Employee

No, it tells you the number of different people in each group-by clause (of which the time-slice is a part). If you want just the number of new users at any time, it's easier to just only count the first time you see a user:

... | stats earliest(clientip) as clientip | timechart count
Highlighted

Re: Does distinct_count() in a timechart count unique instances per time slice or per chart?

New Member

I think ... | timechart dc(clientIP) as clientIP is a better option.

0 Karma
Highlighted

Re: Does distinct_count() in a timechart count unique instances per time slice or per chart?

Explorer

this one works! thank you!!

0 Karma
Highlighted

Re: Does distinct_count() in a timechart count unique instances per time slice or per chart?

Explorer

I downvoted this post because this shows unique per bucket, not per search period, which is what the requester was looking for.

0 Karma
Highlighted

Re: Does distinct_count() in a timechart count unique instances per time slice or per chart?

Builder

this works too 🙂

0 Karma
Highlighted

Re: Does distinct_count() in a timechart count unique instances per time slice or per chart?

Path Finder

Since this thread seems to be still active and unsolved i'll post my solution: expanding the logic proposed by @gkanapathy you can count the _time of the first occurrence of a new IP address:

| stats earliest(_time) as _time by clientip | timechart count(_time)

The only problem with this logic is that Ip addresses that have first appeared earlier than the time range considered will be counted in the first time span. It's a problem that's going to matter less the longer your time range and span are but, honestly, i don't know even if this can be solved and how.

0 Karma