Hi everyone,
basically I am trying to count how many unique customers I had in a period and that worked well with dc(clientid).
Until I wanted to know "How many unique Customers do I have by product group"
|stats dc(clientid) as Customers by year quarter product_toplevel product_lowerlevel
which, when for example looking at only 1 specific client who purchased 3 products within one group can return entries like
year | quarter | product_toplevel | product_lowerlevel | Customers |
2022 | 1 | Group A | Article 1 | 1 |
2022 | 1 | Group A | Article 2 | 1 |
2022 | 1 | Group A | Article 3 | 1 |
Now Splunk is not wrong here. Looking at each product there is 1 unique customer. However if I were to sum up the Customers-column it would look like I have 3 Customers in total.
I would much rather have it return the value "0.3" for "Customers" in the above example so that I can export the table to excel and work with pivot-tables while retaining the 'correct' total of unique customers.
For that purpose I intend to use eventstats to create a sort of column-total and then divide the value of Customers by that column total.
But I can not figure out how to do it. Any ideas?
Your approach seems reasonable. "Generalized" stats can give you only one overall "division" across your data. So you can't have two different numbers (for example - one "by product" and another "by customer" only using stats alone).
So counting with eventstats does seem the way to go but I don't quite understand why you would like to have 0.3 (I assume you meant 1/3=0.33(3)) customer per row.
The general idea would be something like that:
| eventstats dc(Customers) as Customers count by year
| eval Customers_per_total=Customers/count
But as I said - I fail to see the point in such calculation.
Hey, thank you very much. Yes, you got it exactly right.
Also for other people who might stumble accros this thread:
When naming fields that are populated with numbers like counted fields it helps to use 1-word names like UniqueCustomers or use underscores like Unique_Customers. If you use "Unique Customers" like I did eval will treat the values of that fields as string even though it is a number and it will not do calculations.
Maybe I oversimplified the explanation of my report but in the full report I group my customers by age, gender, regions, zones within that region, Customer type and Sales Channel. With all that added I fear that some groups of cusotmers might be over- or underrepresented and this calculation aims to compensate for that. Although maybe my thoughts take a wrong turn somewhere.
Your approach seems reasonable. "Generalized" stats can give you only one overall "division" across your data. So you can't have two different numbers (for example - one "by product" and another "by customer" only using stats alone).
So counting with eventstats does seem the way to go but I don't quite understand why you would like to have 0.3 (I assume you meant 1/3=0.33(3)) customer per row.
The general idea would be something like that:
| eventstats dc(Customers) as Customers count by year
| eval Customers_per_total=Customers/count
But as I said - I fail to see the point in such calculation.