Knowledge Management
Highlighted

Summary index results are limited

Explorer

I’m building a report that finds the number of unique users in our activity log each day:

sourcetype="accountTransaction" | timechart span="1d" dc(accountID)

The results are in the neighborhood of 12,000 each day.

This search takes forever to complete, so it seems like a perfect opportunity to use a summary index. So, I changed the search to this:

sourcetype="accountTransaction" | sitimechart span="1d" dc(accountID)

Saved it and scheduled it to run hourly and to use summary indexing. The job runs, but then when I run the search against it:

index=summary search_name="30-day DAU summary" |timechart span=1d dc(accountID)

The result (while nearly instantaneous) is dc(accountID)=1000 every single day – a flat line. Any idea what’s going on? Am I hitting a limit somewhere that I don’t know about?

Tags (1)
Highlighted

Re: Summary index results are limited

Legend

So, first of all, I must ask if you ever need to have a distinct count of unique users by anything other than a day? If so, will you need it for arbitrary periods or just fixed specific ones, i.e., would you need the count for some random 6-day period starting 43 days ago, or would you only need it for an entire month from the 1st to the end, or a week from Sunday through Saturday?

The reason this matters is that si versions of stats and timechart can be very space-inefficient when you use dc(), because they must store information to let you aggregate up to any arbitrary interval. If you don't need that, you can conserve a lot of space by using plain stats to store just the specific periods you want. For example, in your case sitimechart/sistats would have to store about 12,000 items per day (each actual item), while a plain stats will store only one entry (just the count). But you can't figure out the complete distinct count over (e.g.) three days from just the distinct counts of each of the three days.

If you can use plain stats instead of sitimechart/sistats, you won't have your limits problem.

Now, the limits problem if you si. Unfortunately, dc()/distinct_count() with si commands has limitations on the number of distinct things that it tracks. This limit is set in limits.conf under the [sistats] section by maxvalues. You can raise this, but if you raise it to over 12,000 to accommodate your data, it is likely that you will also need to increase the TRUNCATE limit on the [stash] sourcetype. Yes, it's getting complicated. I also don't know if you'll bump into other limits when reporting back. Hopefully not.

You can avoid making these configu changes if you use sistats count by user instead of sistats dc(user), but this will substantially slow down reporting, and require you to slightly change your reporting queries.

Highlighted

Re: Summary index results are limited

Explorer

Thanks for the reply.

As to your questions - for the dashboard, I need a chart of unique users by day for the past 30 days. We wouldn't need the count over an arbitrary time window.

Ideally I should only have to run this search for each day - the unique user count isn't going to change historically.

I'll play around with "stats count by user" but I'm not sure how to handle uniques in that situation.

0 Karma
Highlighted

Re: Summary index results are limited

Legend

Well, ...|stats count by user | stats count is the same as ...|stats dc(user) as count with the exception that the former won't hit the same limits. So if you use ...|sistats count by user and add | stats count when you get the data back out of the summary index, you will have the distinct count.

0 Karma
Highlighted

Re: Summary index results are limited

Explorer

I see - many thanks!

0 Karma