I need help figuring out how to store visitor session info into a summary index.
First, what I want to be able to do: query the summary index and return how many visitor sessions our site had in x-time. Also, I want to return how many visitor sessions a particular path has /path/y with x-time.
I tried building a saved search that wrote to the summary index; web events transactioned by ip & user agent: host="webserver" | transaction ip UserAgent maxspan=30m | stats values(uri_stem) by ip UserAgent
This appeared to work until I noticed that each day had exactly 10,000 visitor sessions. The query was hitting a limit. This made me realize that dumping all of the data into the summary index doesn't actually gain me anything, since nothing is being summarized 🙂
The other way I thought about was to calculate the # of visitor sessions per day, and save that into the summary index. The problem here is that I would only be able to tell how many sessions there were for the whole site; I couldn't get sessions for just a subsection.
Any ideas on how to get both searches to work from the same summary index data? I don't want to have to setup a new summary index search every time someone thinks of something new to search for.
You are correct in your observation that storing the full information about (user, uri_stem) pair isn't going to be much better than a search over your raw data. Another problem is that if you don't store distinct user ids, and you just store distinct counts, you can't combine several time periods (say hours in the day) to form a whole (the whole day) because there is overlap in the counts.
You should choose the time granularities that you need to report on and summarize distinct counts for each uri_stem and the site as a whole for that period, and persist this into the summary index.
Your search would be:
... | eval uid = ip + UserAgent | stats dc(uid) as visitors by uri_stem
Then reporting on this would be:
index=summary source=search_name | stats max(visitors) by uri_stem
If you want to look at the site as a whole, you can't combine the data by uri_stem, since there's overlap. However, you can save off a row in the summary for "ALL" as follows using multivalued fields:
... | eval uid = ip + UserAgent | eval uri_stem = uri_stem + " ALL" | makemv uri_stem | stats dc(uid) as visitors by uri_stem