How to get statistical distribution of multivalue ...

JamesWierzba · ‎06-02-2023

I am starting with this query to show which types of products our top customers buy

``` get all purchases ```
index=customer_data action=purchase
``` narrow down to purchases from our top 100 customers (based on purchase count) ```
[search index=customer_data action=purchase top limit=100 user_id | table user_id]
``` show type of products that each user purchased ```
| stats count, list(product_category) by user_id

This will give some output like this:

user_id | values(product_category)
------------------------------------
1234 | clothing, clothing, clothing, food
2345 | electronics, electronics, food, food

I need one additional piece of information: the ratio of the product categories, for each user. I need this output instead. The format of the distributions is not important

user_id | values(product_category)
------------------------------------
1234 | clothing=75%, food=25%
2345 | electronics=50%, food=50%

The best idea I have thus far is something like this:

``` get all purchases ```
index=customer_data action=purchase
``` narrow down to purchases from our top 100 customers (based on purchase count) ```
[search index=customer_data action=purchase top limit=100 user_id | table user_id]
``` show type of products that each user purchased ```
| stats count, list(product_category) by user_id
| eval product_category_distribution = ???
| table user_id, product_category_distribution

but I can't find any such function ...

yuanliu · ‎06-02-2023

You count by category first, then device this count by total.

index=customer_data action=purchase
| top limit=100 user_id
``` the above should give you all purchases by top 100 users ```
| stats count by user_id product_category
| eventstats sum(count) as total by user_id
| eval percent = round(100 * count / total, 2)
| stats values(eval(product_category . "=" . percent . "%")) as product_category by user_id
| eval product_category = mvjoin(product_category)

Here, I fail to see why you need a subsearch to pick out top 100 when the top 100 command should suffice. But if that is needed, just add it to the top.

ITWhisperer · ‎06-02-2023

``` get all purchases ```
index=customer_data action=purchase
``` narrow down to purchases from our top 100 customers (based on purchase count) ```
[search index=customer_data action=purchase top limit=100 user_id | table user_id]
``` show type of products that each user purchased ```
| stats count, list(product_category) as product_category by user_id
| stats values(count) as total count by user_id product_category
| eval product_category=product_category."=".round(100*count/total,2)."%"
| stats values(product_category) as product_category by user_id
| eval product_category=mvjoin(product_category,", ")

How to get statistical distribution of multivalue entry in Splunk?

eval

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

What’s New in Splunk Observability – September 2025

Fun with Regular Expression - multiples of nine

Are you a member of the Splunk Community?

How to get statistical distribution of multivalue entry in Splunk?

eval

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

What’s New in Splunk Observability – September 2025

Fun with Regular Expression - multiples of nine