Splunk Search

How to use (event-)stats and calculations to predict how many unique customers do I have by product group

nord_nikon
Engager

Hi everyone,

 

basically I am trying to count how many unique customers I had in a period and that worked well with dc(clientid).

Until I wanted to know "How many unique Customers do I have by product group"

 

 

 

 

 

|stats dc(clientid) as Customers by year quarter product_toplevel product_lowerlevel

 

 

 

 

 

which, when for example looking at only 1 specific client who purchased 3 products within one group can return entries like

year quarter product_toplevel product_lowerlevel Customers
2022 1 Group A Article 1 1
2022 1  Group A Article 2 1
2022 1  Group A  Article 3 1

 

Now Splunk is not wrong here. Looking at each product there is 1 unique customer. However if I were to sum up the Customers-column it would look like I have 3 Customers in total.

I would much rather have it return the value "0.3" for "Customers" in the above example so that I can export the table to excel and work with pivot-tables while retaining the 'correct' total of unique customers.

For that purpose I intend to use eventstats to create a sort of column-total and then divide the value of Customers by that column total.

But I can not figure out how to do it. Any ideas?

Labels (2)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Your approach seems reasonable. "Generalized" stats can give you only one overall "division" across your data. So you can't have two different numbers (for example - one "by product" and another "by customer" only using stats alone).

So counting with eventstats does seem the way to go but I don't quite understand why you would like to have 0.3 (I assume you meant 1/3=0.33(3)) customer per row.

The general idea would be something like that:

| eventstats dc(Customers) as Customers count by year
| eval Customers_per_total=Customers/count

But as I said - I fail to see the point in such calculation.

View solution in original post

nord_nikon
Engager

Hey, thank you very much. Yes, you got it exactly right.

Also for other people who might stumble accros this thread:
When naming fields that are populated with numbers like counted fields it helps to use 1-word names like UniqueCustomers or use underscores like Unique_Customers. If you use "Unique Customers" like I did eval will treat the values of that fields as string even though it is a number and it will not do calculations.


Maybe I oversimplified the explanation of my report but in the full report I group my customers by age, gender, regions, zones within that region, Customer type and Sales Channel.  With all that added I fear that some groups of cusotmers might be over- or underrepresented and this calculation aims to compensate for that. Although maybe my thoughts take a wrong turn somewhere.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Your approach seems reasonable. "Generalized" stats can give you only one overall "division" across your data. So you can't have two different numbers (for example - one "by product" and another "by customer" only using stats alone).

So counting with eventstats does seem the way to go but I don't quite understand why you would like to have 0.3 (I assume you meant 1/3=0.33(3)) customer per row.

The general idea would be something like that:

| eventstats dc(Customers) as Customers count by year
| eval Customers_per_total=Customers/count

But as I said - I fail to see the point in such calculation.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...