Splunk Search

How to get statistical distribution of multivalue entry in Splunk?

JamesWierzba
Observer

I am starting with this query to show which types of products our top customers buy

 

 

``` get all purchases ```
index=customer_data action=purchase
``` narrow down to purchases from our top 100 customers (based on purchase count) ```
[search index=customer_data action=purchase top limit=100 user_id | table user_id]
``` show type of products that each user purchased ```
| stats count, list(product_category) by user_id

 

 

This will give some output like this:

 

 

user_id | values(product_category)
------------------------------------
1234 | clothing, clothing, clothing, food
2345 | electronics, electronics, food, food

 

 

 

I need one additional piece of information: the ratio of the product categories, for each user. I need this output instead. The format of the distributions is not important

 

 

user_id | values(product_category)
------------------------------------
1234 | clothing=75%, food=25%
2345 | electronics=50%, food=50%

 

 

 

The best idea I have thus far is something like this:

 

 

``` get all purchases ```
index=customer_data action=purchase
``` narrow down to purchases from our top 100 customers (based on purchase count) ```
[search index=customer_data action=purchase top limit=100 user_id | table user_id]
``` show type of products that each user purchased ```
| stats count, list(product_category) by user_id
| eval product_category_distribution = ???
| table user_id, product_category_distribution

 

 

 

but I can't find any such function ...

 

 

Labels (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

You count by category first, then device this count by total.

index=customer_data action=purchase
| top limit=100 user_id
``` the above should give you all purchases by top 100 users ```
| stats count by user_id product_category
| eventstats sum(count) as total by user_id
| eval percent = round(100 * count / total, 2)
| stats values(eval(product_category . "=" . percent . "%")) as product_category by user_id
| eval product_category = mvjoin(product_category)

Here, I fail to see why you need a subsearch to pick out top 100 when the top 100 command should suffice.  But if that is needed, just add it to the top.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
``` get all purchases ```
index=customer_data action=purchase
``` narrow down to purchases from our top 100 customers (based on purchase count) ```
[search index=customer_data action=purchase top limit=100 user_id | table user_id]
``` show type of products that each user purchased ```
| stats count, list(product_category) as product_category by user_id
| stats values(count) as total count by user_id product_category
| eval product_category=product_category."=".round(100*count/total,2)."%"
| stats values(product_category) as product_category by user_id
| eval product_category=mvjoin(product_category,", ")
0 Karma
Get Updates on the Splunk Community!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

It’s Monday morning, and your phone is buzzing with alert escalations – your customer-facing portal is running ...

What’s New in Splunk Observability – September 2025

What's NewWe are excited to announce the latest enhancements to Splunk Observability, designed to help ITOps ...

Fun with Regular Expression - multiples of nine

Fun with Regular Expression - multiples of nineThis challenge was first posted on Slack #regex channel ...