Splunk Search

How to get statistical distribution of multivalue entry in Splunk?

JamesWierzba
Observer

I am starting with this query to show which types of products our top customers buy

 

 

``` get all purchases ```
index=customer_data action=purchase
``` narrow down to purchases from our top 100 customers (based on purchase count) ```
[search index=customer_data action=purchase top limit=100 user_id | table user_id]
``` show type of products that each user purchased ```
| stats count, list(product_category) by user_id

 

 

This will give some output like this:

 

 

user_id | values(product_category)
------------------------------------
1234 | clothing, clothing, clothing, food
2345 | electronics, electronics, food, food

 

 

 

I need one additional piece of information: the ratio of the product categories, for each user. I need this output instead. The format of the distributions is not important

 

 

user_id | values(product_category)
------------------------------------
1234 | clothing=75%, food=25%
2345 | electronics=50%, food=50%

 

 

 

The best idea I have thus far is something like this:

 

 

``` get all purchases ```
index=customer_data action=purchase
``` narrow down to purchases from our top 100 customers (based on purchase count) ```
[search index=customer_data action=purchase top limit=100 user_id | table user_id]
``` show type of products that each user purchased ```
| stats count, list(product_category) by user_id
| eval product_category_distribution = ???
| table user_id, product_category_distribution

 

 

 

but I can't find any such function ...

 

 

Labels (2)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

You count by category first, then device this count by total.

index=customer_data action=purchase
| top limit=100 user_id
``` the above should give you all purchases by top 100 users ```
| stats count by user_id product_category
| eventstats sum(count) as total by user_id
| eval percent = round(100 * count / total, 2)
| stats values(eval(product_category . "=" . percent . "%")) as product_category by user_id
| eval product_category = mvjoin(product_category)

Here, I fail to see why you need a subsearch to pick out top 100 when the top 100 command should suffice.  But if that is needed, just add it to the top.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
``` get all purchases ```
index=customer_data action=purchase
``` narrow down to purchases from our top 100 customers (based on purchase count) ```
[search index=customer_data action=purchase top limit=100 user_id | table user_id]
``` show type of products that each user purchased ```
| stats count, list(product_category) as product_category by user_id
| stats values(count) as total count by user_id product_category
| eval product_category=product_category."=".round(100*count/total,2)."%"
| stats values(product_category) as product_category by user_id
| eval product_category=mvjoin(product_category,", ")
0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...