Splunk Search

Cumulative Distinct over time from the start of the selected time range

cbhattad
Path Finder

Hi,
I need to find out distinct number of users over time per hour.

I have managed to reach the below query:
| timechart span=1h dc(user_id)

What the above query does is that, it finds distinct users for each one hour.
So it returns:

time . distinct_users
12:00 am - 152
1:00 am - 50
2:00 am - 64

and so on..

What I expect is, the counts should be always increasing.

so at 2:00 am above, I expect distinct users from 12:00 am to 2:00am

woodcock
Esteemed Legend

To do an accumulated dc(user), do it like this:

... | bin _time span=1h
| stats values(user) AS users BY _time
| streamstats values(users) AS users
| timechart span=1h dc(users)

Here is a run-anywhere example:

index=_* 
| bin _time span=1h
| stats values(pid) AS users BY _time
| streamstats values(users) AS users
| timechart span=1h dc(users)
0 Karma

wmyersas
Builder

Why do you "expect" it to give an "always increasing" count?

By using span=1h, you're only going to get distinct counts for that hour (ie from 0100 to 0200).

If you want the number of distinct users from 0000-0200, do the following:

| timechart span=2h dc(user_id)

If you want to see distinct users over a whole day, do:

| timechart span=1d dc(user_id)

You probably don't want to sum the values from 0000-0100 and 0100-0200, because then you could easily end up with the same user_id counted more than once.

If you want an "always increasing" count per hour of unique user_id counts, you could do something like the following:

index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-23h@
| stats dc(user_id) as distinctusers
| eval window="24-23 hours ago"
| append
[ search index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-22h@
  | stats dc(user_id) as distinctusers
  | eval window="24-22 hours ago"
]
| append
[ search index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-21h@
  | stats dc(user_id) as distinctusers
  | eval window="24-21 hours ago"
]

Etc etc

Might also try a multisearch instead of append:

| multisearch
  [ search index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-23h@
    | eval window="24 to 23 hours ago"
  ]
  [ search index=ndx sourcetype=srctp user_id=* earliest=-24h@ latest=-23h@
    | eval window="24 to 22 hours ago"
  ]
  ....
| stats dc(user_id) as "Distinct Users" by window

renjith_nair
Legend

@cbhattad

Try adding |streamstats sum(distinct_users) as count to your search

Happy Splunking!

cbhattad
Path Finder

Actually it wont help. It will sum the distinct users over hours, and the sum of distinct users might not be the actual count of "distinct users".

Eg:
10-11am 54
11-12 46

Your answer will give distinct as 100, where as distinct users can be just 65. because few of the same users might have logged in for 11-12 also

Get Updates on the Splunk Community!

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...