Solved: How to use (event-)stats and calculations to predi...

nord_nikon · ‎07-08-2022

Hi everyone,

basically I am trying to count how many unique customers I had in a period and that worked well with dc(clientid).

Until I wanted to know "How many unique Customers do I have by product group"

|stats dc(clientid) as Customers by year quarter product_toplevel product_lowerlevel

which, when for example looking at only 1 specific client who purchased 3 products within one group can return entries like

year	quarter	product_toplevel	product_lowerlevel	Customers
2022	1	Group A	Article 1	1
2022	1	Group A	Article 2	1
2022	1	Group A	Article 3	1

Now Splunk is not wrong here. Looking at each product there is 1 unique customer. However if I were to sum up the Customers-column it would look like I have 3 Customers in total.

I would much rather have it return the value "0.3" for "Customers" in the above example so that I can export the table to excel and work with pivot-tables while retaining the 'correct' total of unique customers.

For that purpose I intend to use eventstats to create a sort of column-total and then divide the value of Customers by that column total.

But I can not figure out how to do it. Any ideas?

PickleRick · ‎07-08-2022

Your approach seems reasonable. "Generalized" stats can give you only one overall "division" across your data. So you can't have two different numbers (for example - one "by product" and another "by customer" only using stats alone).

So counting with eventstats does seem the way to go but I don't quite understand why you would like to have 0.3 (I assume you meant 1/3=0.33(3)) customer per row.

The general idea would be something like that:

| eventstats dc(Customers) as Customers count by year
| eval Customers_per_total=Customers/count

But as I said - I fail to see the point in such calculation.

View solution in original post

nord_nikon · ‎07-12-2022

Hey, thank you very much. Yes, you got it exactly right.

Also for other people who might stumble accros this thread:
When naming fields that are populated with numbers like counted fields it helps to use 1-word names like UniqueCustomers or use underscores like Unique_Customers. If you use "Unique Customers" like I did eval will treat the values of that fields as string even though it is a number and it will not do calculations.

Maybe I oversimplified the explanation of my report but in the full report I group my customers by age, gender, regions, zones within that region, Customer type and Sales Channel. With all that added I fear that some groups of cusotmers might be over- or underrepresented and this calculation aims to compensate for that. Although maybe my thoughts take a wrong turn somewhere.

PickleRick · ‎07-08-2022

Your approach seems reasonable. "Generalized" stats can give you only one overall "division" across your data. So you can't have two different numbers (for example - one "by product" and another "by customer" only using stats alone).

So counting with eventstats does seem the way to go but I don't quite understand why you would like to have 0.3 (I assume you meant 1/3=0.33(3)) customer per row.

The general idea would be something like that:

| eventstats dc(Customers) as Customers count by year
| eval Customers_per_total=Customers/count

But as I said - I fail to see the point in such calculation.

How to use (event-)stats and calculations to predict how many unique customers do I have by product group

count

stats

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Join the Conversation