Splunk Search

How to use (event-)stats and calculations to predict how many unique customers do I have by product group

nord_nikon
Engager

Hi everyone,

 

basically I am trying to count how many unique customers I had in a period and that worked well with dc(clientid).

Until I wanted to know "How many unique Customers do I have by product group"

 

 

 

 

 

|stats dc(clientid) as Customers by year quarter product_toplevel product_lowerlevel

 

 

 

 

 

which, when for example looking at only 1 specific client who purchased 3 products within one group can return entries like

year quarter product_toplevel product_lowerlevel Customers
2022 1 Group A Article 1 1
2022 1  Group A Article 2 1
2022 1  Group A  Article 3 1

 

Now Splunk is not wrong here. Looking at each product there is 1 unique customer. However if I were to sum up the Customers-column it would look like I have 3 Customers in total.

I would much rather have it return the value "0.3" for "Customers" in the above example so that I can export the table to excel and work with pivot-tables while retaining the 'correct' total of unique customers.

For that purpose I intend to use eventstats to create a sort of column-total and then divide the value of Customers by that column total.

But I can not figure out how to do it. Any ideas?

Labels (2)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Your approach seems reasonable. "Generalized" stats can give you only one overall "division" across your data. So you can't have two different numbers (for example - one "by product" and another "by customer" only using stats alone).

So counting with eventstats does seem the way to go but I don't quite understand why you would like to have 0.3 (I assume you meant 1/3=0.33(3)) customer per row.

The general idea would be something like that:

| eventstats dc(Customers) as Customers count by year
| eval Customers_per_total=Customers/count

But as I said - I fail to see the point in such calculation.

View solution in original post

nord_nikon
Engager

Hey, thank you very much. Yes, you got it exactly right.

Also for other people who might stumble accros this thread:
When naming fields that are populated with numbers like counted fields it helps to use 1-word names like UniqueCustomers or use underscores like Unique_Customers. If you use "Unique Customers" like I did eval will treat the values of that fields as string even though it is a number and it will not do calculations.


Maybe I oversimplified the explanation of my report but in the full report I group my customers by age, gender, regions, zones within that region, Customer type and Sales Channel.  With all that added I fear that some groups of cusotmers might be over- or underrepresented and this calculation aims to compensate for that. Although maybe my thoughts take a wrong turn somewhere.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Your approach seems reasonable. "Generalized" stats can give you only one overall "division" across your data. So you can't have two different numbers (for example - one "by product" and another "by customer" only using stats alone).

So counting with eventstats does seem the way to go but I don't quite understand why you would like to have 0.3 (I assume you meant 1/3=0.33(3)) customer per row.

The general idea would be something like that:

| eventstats dc(Customers) as Customers count by year
| eval Customers_per_total=Customers/count

But as I said - I fail to see the point in such calculation.

Get Updates on the Splunk Community!

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Splunk Education Goes to Washington | Splunk GovSummit 2024

If you’re in the Washington, D.C. area, this is your opportunity to take your career and Splunk skills to the ...