I was reading Example 3 in this tutorial - to do with distinct_count().
I would like to know when you apply distinct_count() to a timechart, if it is counting something as distinct for a single time slice (i.e. counting it again in the next time slice) or if it is counting something as distinct across the entire chart.
So, applied to Example 3, I think it would be
sourcetype=access_* action=purchase category_id=flowers | timechart dc(clientip)
Which I would expect to generate a timechart with the count of distinct/unique clientip over time. I.e. count a user when they first purchased flowers, and never count them again.
Is this what is happening, or does it count the user once in the first month, and then count them once again in the second month (assuming time slices are in months)?
The reason I am asking is that I want a time chart of the number of new users over time, so I do not want to count the same user ever again.
No, it tells you the number of different people in each group-by clause (of which the time-slice is a part). If you want just the number of new users at any time, it's easier to just only count the first time you see a user:
... | stats earliest(clientip) as clientip | timechart count
I downvoted this post because this shows unique per bucket, not per search period, which is what the requester was looking for.
Since this thread seems to be still active and unsolved i'll post my solution: expanding the logic proposed by @gkanapathy you can count the
_time of the first occurrence of a new IP address:
| stats earliest(_time) as _time by clientip | timechart count(_time)
The only problem with this logic is that Ip addresses that have first appeared earlier than the time range considered will be counted in the first time span. It's a problem that's going to matter less the longer your time range and span are but, honestly, i don't know even if this can be solved and how.