Splunk Search

Cumulative Distinct over time from the start of the selected time range

Path Finder

Hi,
I need to find out distinct number of users over time per hour.

I have managed to reach the below query:
| timechart span=1h dc(user_id)

What the above query does is that, it finds distinct users for each one hour.
So it returns:

time . distinct_users
12:00 am - 152
1:00 am - 50
2:00 am - 64

and so on..

What I expect is, the counts should be always increasing.

so at 2:00 am above, I expect distinct users from 12:00 am to 2:00am

Esteemed Legend

To do an accumulated dc(user), do it like this:

... | bin _time span=1h
| stats values(user) AS users BY _time
| streamstats values(users) AS users
| timechart span=1h dc(users)

Here is a run-anywhere example:

index=_* 
| bin _time span=1h
| stats values(pid) AS users BY _time
| streamstats values(users) AS users
| timechart span=1h dc(users)
0 Karma

Builder

Why do you "expect" it to give an "always increasing" count?

By using span=1h, you're only going to get distinct counts for that hour (ie from 0100 to 0200).

If you want the number of distinct users from 0000-0200, do the following:

| timechart span=2h dc(user_id)

If you want to see distinct users over a whole day, do:

| timechart span=1d dc(user_id)

You probably don't want to sum the values from 0000-0100 and 0100-0200, because then you could easily end up with the same user_id counted more than once.

If you want an "always increasing" count per hour of unique user_id counts, you could do something like the following:

index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-23h@
| stats dc(user_id) as distinctusers
| eval window="24-23 hours ago"
| append
[ search index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-22h@
  | stats dc(user_id) as distinctusers
  | eval window="24-22 hours ago"
]
| append
[ search index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-21h@
  | stats dc(user_id) as distinctusers
  | eval window="24-21 hours ago"
]

Etc etc

Might also try a multisearch instead of append:

| multisearch
  [ search index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-23h@
    | eval window="24 to 23 hours ago"
  ]
  [ search index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-23h@
    | eval window="24 to 22 hours ago"
  ]
  ....
| stats dc(user_id) as "Distinct Users" by window

SplunkTrust
SplunkTrust

@cbhattad

Try adding |streamstats sum(distinct_users) as count to your search

Path Finder

Actually it wont help. It will sum the distinct users over hours, and the sum of distinct users might not be the actual count of "distinct users".

Eg:
10-11am 54
11-12 46

Your answer will give distinct as 100, where as distinct users can be just 65. because few of the same users might have logged in for 11-12 also