Splunk Search

How to use (event-)stats and calculations to predict how many unique customers do I have by product group

nord_nikon
Engager

Hi everyone,

 

basically I am trying to count how many unique customers I had in a period and that worked well with dc(clientid).

Until I wanted to know "How many unique Customers do I have by product group"

 

 

 

 

 

|stats dc(clientid) as Customers by year quarter product_toplevel product_lowerlevel

 

 

 

 

 

which, when for example looking at only 1 specific client who purchased 3 products within one group can return entries like

year quarter product_toplevel product_lowerlevel Customers
2022 1 Group A Article 1 1
2022 1  Group A Article 2 1
2022 1  Group A  Article 3 1

 

Now Splunk is not wrong here. Looking at each product there is 1 unique customer. However if I were to sum up the Customers-column it would look like I have 3 Customers in total.

I would much rather have it return the value "0.3" for "Customers" in the above example so that I can export the table to excel and work with pivot-tables while retaining the 'correct' total of unique customers.

For that purpose I intend to use eventstats to create a sort of column-total and then divide the value of Customers by that column total.

But I can not figure out how to do it. Any ideas?

Labels (2)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Your approach seems reasonable. "Generalized" stats can give you only one overall "division" across your data. So you can't have two different numbers (for example - one "by product" and another "by customer" only using stats alone).

So counting with eventstats does seem the way to go but I don't quite understand why you would like to have 0.3 (I assume you meant 1/3=0.33(3)) customer per row.

The general idea would be something like that:

| eventstats dc(Customers) as Customers count by year
| eval Customers_per_total=Customers/count

But as I said - I fail to see the point in such calculation.

View solution in original post

nord_nikon
Engager

Hey, thank you very much. Yes, you got it exactly right.

Also for other people who might stumble accros this thread:
When naming fields that are populated with numbers like counted fields it helps to use 1-word names like UniqueCustomers or use underscores like Unique_Customers. If you use "Unique Customers" like I did eval will treat the values of that fields as string even though it is a number and it will not do calculations.


Maybe I oversimplified the explanation of my report but in the full report I group my customers by age, gender, regions, zones within that region, Customer type and Sales Channel.  With all that added I fear that some groups of cusotmers might be over- or underrepresented and this calculation aims to compensate for that. Although maybe my thoughts take a wrong turn somewhere.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Your approach seems reasonable. "Generalized" stats can give you only one overall "division" across your data. So you can't have two different numbers (for example - one "by product" and another "by customer" only using stats alone).

So counting with eventstats does seem the way to go but I don't quite understand why you would like to have 0.3 (I assume you meant 1/3=0.33(3)) customer per row.

The general idea would be something like that:

| eventstats dc(Customers) as Customers count by year
| eval Customers_per_total=Customers/count

But as I said - I fail to see the point in such calculation.

Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...