I'm using splunk to track events that happen with users in different treatments of a split test. For example, how often do users in treatment 1 register or perform a search vs. users in treatment 2 or treatment 3.
I can see the results to see the raw number of times each event occurred for each treatment using something like the following:
sourcetype=eventtracking | stats count by eventtype, treatment
Which produces something like this:
Search treatment1 900
Search treatment2 200
Login treatment1 135
Login treatment2 10
This works great when all the tests are equal - if there are 2 treatments at 50% each for example. But when the tests are unequal, for example if one treatment is at 10% and the other at 90%, it's hard to find meaning in the graphs, because the larger group will always have more events.
I can get a count of how many users are in each group using something like the following:
sourcetype=eventtracking | stats distinct_count(user_guid) as count by treatment
Which produces something like this:
treatment1 90
treatment2 10
I'd like to see the number of times an event occurred divided by the number of people in the test group, to even out the playing field. I've tried a number of different things with no luck. How can I combine these queries to see proportional results, something like this:
Search treatment1 10
Search treatment2 20
Login treatment1 1.5
Login treatment2 1
Thanks!
The most efficient way is to use eventstats
:
sourcetype=eventtracking
| stats count
dc(user_guid) as UserCount
by eventtype, treatment
| eventstats sum(UserCount) as UserCount by treatment
| eval pct=100*count/UserCount
The most efficient way is to use eventstats
:
sourcetype=eventtracking
| stats count
dc(user_guid) as UserCount
by eventtype, treatment
| eventstats sum(UserCount) as UserCount by treatment
| eval pct=100*count/UserCount
Sorry it's taken me a week to reply. But that's perfect. Gets exactly what I want. Thanks!
oh right. then you need:
sourcetype=eventtracking | stats count, values(user_guid) as UserList by eventtype,treatment | eventstats dc(UserList) as UserCount by treatment | eval pct=100*count/UserCount
Thanks for the suggestion. The problem there is that the distinct count is no longer distinct. A user will get counted once per eventtype, not once overall.
Thanks! I have been trying to figure this out for a couple of weeks, and of course just came up with the right answer shortly after posting. This version works across treatments without me having to hardcode the treatment names:
sourcetype=eventtracking | stats count as EventCount by treatment, eventtype | join treatment [search sourcetype=eventtracking | stats distinct_count(user_guid) as UserCount by treatment] | eval ratio = EventCount / UserCount
If you do an individual search for each treatment group and eventtype, i.e:
sourcetype=eventtracking eventype=search treatment=treatment1
then you could just do an eval on the two results:
sourcetype=eventtracking eventype=search treatment=treatment1 | stats distinct_count(user_guid) as Users | eval percentage=(Users/count)*100