Hi,
I need to find out distinct number of users over time per hour.
I have managed to reach the below query:
| timechart span=1h dc(user_id)
What the above query does is that, it finds distinct users for each one hour.
So it returns:
time . distinct_users
12:00 am - 152
1:00 am - 50
2:00 am - 64
and so on..
What I expect is, the counts should be always increasing.
so at 2:00 am above, I expect distinct users from 12:00 am to 2:00am
To do an accumulated dc(user)
, do it like this:
... | bin _time span=1h
| stats values(user) AS users BY _time
| streamstats values(users) AS users
| timechart span=1h dc(users)
Here is a run-anywhere example:
index=_*
| bin _time span=1h
| stats values(pid) AS users BY _time
| streamstats values(users) AS users
| timechart span=1h dc(users)
Why do you "expect" it to give an "always increasing" count?
By using span=1h
, you're only going to get distinct counts for that hour (ie from 0100 to 0200).
If you want the number of distinct users from 0000-0200, do the following:
| timechart span=2h dc(user_id)
If you want to see distinct users over a whole day, do:
| timechart span=1d dc(user_id)
You probably don't want to sum the values from 0000-0100 and 0100-0200, because then you could easily end up with the same user_id
counted more than once.
If you want an "always increasing" count per hour of unique user_id
counts, you could do something like the following:
index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-23h@
| stats dc(user_id) as distinctusers
| eval window="24-23 hours ago"
| append
[ search index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-22h@
| stats dc(user_id) as distinctusers
| eval window="24-22 hours ago"
]
| append
[ search index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-21h@
| stats dc(user_id) as distinctusers
| eval window="24-21 hours ago"
]
Etc etc
Might also try a multisearch
instead of append
:
| multisearch
[ search index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-23h@
| eval window="24 to 23 hours ago"
]
[ search index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-23h@
| eval window="24 to 22 hours ago"
]
....
| stats dc(user_id) as "Distinct Users" by window
@cbhattad
Try adding |streamstats sum(distinct_users) as count
to your search
Actually it wont help. It will sum the distinct users over hours, and the sum of distinct users might not be the actual count of "distinct users".
Eg:
10-11am 54
11-12 46
Your answer will give distinct as 100, where as distinct users can be just 65. because few of the same users might have logged in for 11-12 also