Splunk Search

Cumulative Distinct over time from the start of the selected time range

cbhattad
Path Finder

Hi,
I need to find out distinct number of users over time per hour.

I have managed to reach the below query:
| timechart span=1h dc(user_id)

What the above query does is that, it finds distinct users for each one hour.
So it returns:

time . distinct_users
12:00 am - 152
1:00 am - 50
2:00 am - 64

and so on..

What I expect is, the counts should be always increasing.

so at 2:00 am above, I expect distinct users from 12:00 am to 2:00am

woodcock
Esteemed Legend

To do an accumulated dc(user), do it like this:

... | bin _time span=1h
| stats values(user) AS users BY _time
| streamstats values(users) AS users
| timechart span=1h dc(users)

Here is a run-anywhere example:

index=_* 
| bin _time span=1h
| stats values(pid) AS users BY _time
| streamstats values(users) AS users
| timechart span=1h dc(users)
0 Karma

wmyersas
Builder

Why do you "expect" it to give an "always increasing" count?

By using span=1h, you're only going to get distinct counts for that hour (ie from 0100 to 0200).

If you want the number of distinct users from 0000-0200, do the following:

| timechart span=2h dc(user_id)

If you want to see distinct users over a whole day, do:

| timechart span=1d dc(user_id)

You probably don't want to sum the values from 0000-0100 and 0100-0200, because then you could easily end up with the same user_id counted more than once.

If you want an "always increasing" count per hour of unique user_id counts, you could do something like the following:

index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-23h@
| stats dc(user_id) as distinctusers
| eval window="24-23 hours ago"
| append
[ search index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-22h@
  | stats dc(user_id) as distinctusers
  | eval window="24-22 hours ago"
]
| append
[ search index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-21h@
  | stats dc(user_id) as distinctusers
  | eval window="24-21 hours ago"
]

Etc etc

Might also try a multisearch instead of append:

| multisearch
  [ search index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-23h@
    | eval window="24 to 23 hours ago"
  ]
  [ search index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-23h@
    | eval window="24 to 22 hours ago"
  ]
  ....
| stats dc(user_id) as "Distinct Users" by window

renjith_nair
Legend

@cbhattad

Try adding |streamstats sum(distinct_users) as count to your search

---
What goes around comes around. If it helps, hit it with Karma 🙂

cbhattad
Path Finder

Actually it wont help. It will sum the distinct users over hours, and the sum of distinct users might not be the actual count of "distinct users".

Eg:
10-11am 54
11-12 46

Your answer will give distinct as 100, where as distinct users can be just 65. because few of the same users might have logged in for 11-12 also

Get Updates on the Splunk Community!

How to Get Started with Splunk Data Management Pipeline Builders (Edge Processor & ...

If you want to gain full control over your growing data volumes, check out Splunk’s Data Management pipeline ...

Out of the Box to Up And Running - Streamlined Observability for Your Cloud ...

  Tech Talk Streamlined Observability for Your Cloud Environment Register    Out of the Box to Up And Running ...

Splunk Smartness with Brandon Sternfield | Episode 3

Hello and welcome to another episode of "Splunk Smartness," the interview series where we explore the power of ...