- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would like to count unique users by day, week, and month. I'm not really sure what's the preferred Splunk method to do this. I've been experimenting with different searches and summary indexes/accelerated reports. I'm struggling with determining the most efficient solution.
I believe populating a summary index with a daily search such as
search | sistats dc(user_id) BY field1, field2, field3
will work. However, my concern is that since there are many unique users the index can become quite large. Changing the search to something like
search | stats count BY user_id, field1, field2, field3
is another option but this would create many events which I don't think would be any more efficient than just searching the source events.
Are there any recommended solutions for counting distinct users over periods of time?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The broader question here "what's the best way to count distinct count of X for each value of foo and bar", has the simple answer | stats dc(X) by foo bar
But the question here is more about how to do this with summary indexing, which is complicated for distinct counts.
Populating a daily summary index search with the results of something like
search | sistats dc(user_id) BY field1, field2, field3
will work great if you only want to report on distinct counts at the day granularity. But for week and month granularities it wont work. The reason is that the sistats command isn't going to preserve the actual values of the user_id's, just what the distinct counts were for each combination of fields on that day. As such it wont have any idea how many of the 150 users it saw on one day are the same users it saw on any other day.
On the other hand you could do
search | stats count BY user_id, field1, field2, field3
as your summary search. This would certainly work but it is indeed quite a lot of rows if there are a lot of unique combinations of user_id, field1, field2, and field3. If the number of such unique combinations is equivalent to the number of events in your raw data, that's a cardinal sin and making this summary index is probably a bad idea that wont be much faster than running the reports raw.
Sometimes a summary indexing use case gets a little overburdened. A sign of this is if the summary index cases uses a lot of fields, but no single report that runs against that summary uses more than a couple. In this case finding the fault lines in the cases and breaking it up into 2 or more summary indexed searches can alleviate the kind of "runaway cross product" and get you back into good performance.
And last but not least there's the sort of caveman approach, of just running three summary index searches of
search | stats dc(user_id) BY field1, field2, field3
on three schedules, once a day, once a week and once a month.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The broader question here "what's the best way to count distinct count of X for each value of foo and bar", has the simple answer | stats dc(X) by foo bar
But the question here is more about how to do this with summary indexing, which is complicated for distinct counts.
Populating a daily summary index search with the results of something like
search | sistats dc(user_id) BY field1, field2, field3
will work great if you only want to report on distinct counts at the day granularity. But for week and month granularities it wont work. The reason is that the sistats command isn't going to preserve the actual values of the user_id's, just what the distinct counts were for each combination of fields on that day. As such it wont have any idea how many of the 150 users it saw on one day are the same users it saw on any other day.
On the other hand you could do
search | stats count BY user_id, field1, field2, field3
as your summary search. This would certainly work but it is indeed quite a lot of rows if there are a lot of unique combinations of user_id, field1, field2, and field3. If the number of such unique combinations is equivalent to the number of events in your raw data, that's a cardinal sin and making this summary index is probably a bad idea that wont be much faster than running the reports raw.
Sometimes a summary indexing use case gets a little overburdened. A sign of this is if the summary index cases uses a lot of fields, but no single report that runs against that summary uses more than a couple. In this case finding the fault lines in the cases and breaking it up into 2 or more summary indexed searches can alleviate the kind of "runaway cross product" and get you back into good performance.
And last but not least there's the sort of caveman approach, of just running three summary index searches of
search | stats dc(user_id) BY field1, field2, field3
on three schedules, once a day, once a week and once a month.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the detailed response. In the end, I've gone with the second search for now. This creates about 10% of the raw events. It's still more data than I would like to have, but I think it's sufficient for now. Maybe I'll set a short retention policy on this index to prevent it from increasing in size too much.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I use DistinctCount if I want to achieve a similar outcome.
