Splunk Search

How to get statistical distribution of multivalue entry in Splunk?

JamesWierzba
Observer

I am starting with this query to show which types of products our top customers buy

 

 

``` get all purchases ```
index=customer_data action=purchase
``` narrow down to purchases from our top 100 customers (based on purchase count) ```
[search index=customer_data action=purchase top limit=100 user_id | table user_id]
``` show type of products that each user purchased ```
| stats count, list(product_category) by user_id

 

 

This will give some output like this:

 

 

user_id | values(product_category)
------------------------------------
1234 | clothing, clothing, clothing, food
2345 | electronics, electronics, food, food

 

 

 

I need one additional piece of information: the ratio of the product categories, for each user. I need this output instead. The format of the distributions is not important

 

 

user_id | values(product_category)
------------------------------------
1234 | clothing=75%, food=25%
2345 | electronics=50%, food=50%

 

 

 

The best idea I have thus far is something like this:

 

 

``` get all purchases ```
index=customer_data action=purchase
``` narrow down to purchases from our top 100 customers (based on purchase count) ```
[search index=customer_data action=purchase top limit=100 user_id | table user_id]
``` show type of products that each user purchased ```
| stats count, list(product_category) by user_id
| eval product_category_distribution = ???
| table user_id, product_category_distribution

 

 

 

but I can't find any such function ...

 

 

Labels (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

You count by category first, then device this count by total.

index=customer_data action=purchase
| top limit=100 user_id
``` the above should give you all purchases by top 100 users ```
| stats count by user_id product_category
| eventstats sum(count) as total by user_id
| eval percent = round(100 * count / total, 2)
| stats values(eval(product_category . "=" . percent . "%")) as product_category by user_id
| eval product_category = mvjoin(product_category)

Here, I fail to see why you need a subsearch to pick out top 100 when the top 100 command should suffice.  But if that is needed, just add it to the top.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
``` get all purchases ```
index=customer_data action=purchase
``` narrow down to purchases from our top 100 customers (based on purchase count) ```
[search index=customer_data action=purchase top limit=100 user_id | table user_id]
``` show type of products that each user purchased ```
| stats count, list(product_category) as product_category by user_id
| stats values(count) as total count by user_id product_category
| eval product_category=product_category."=".round(100*count/total,2)."%"
| stats values(product_category) as product_category by user_id
| eval product_category=mvjoin(product_category,", ")
0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...