How could I correlate values in multiple fields th...

digital_alchemy · ‎01-21-2024

So, I've been away from Splunk for several years now, and now re-visiting it. I've got a scenario where I would like to track certain metrics from imported data. I created a simple CSV with just a few entries to demonstrate the issues I'm having.

Below is the source data I created:

customer_id	Time	customer_fname	customer_lname	products	product_prices
111	12/1/2023	John	Doe	product_100,product_200	100,200
222	12/11/2023	Suzy	Que	product_100	100
333	12/15/2023	Jack	Jones	product_300	300
111	12/18/2023	John	Doe	product_400	400

In this scenario this is just examples of customers and the items they purchased and the price paid.

After uploading the file and displaying the data in a table it looks as expected:

source="test_sales.csv"
| table customer_id,customer_fname,customer_lname,products,product_prices

Upon using makemv to convert "products" and "product_prices" to multi-value fields, again the results are as expected and the product and price align since they were input into the source CSV in the proper order:

source="test_sales.csv"
| makemv delim="," products
| makemv delim="," product_prices
| table customer_id,customer_fname,customer_lname,products,product_prices

Here is where my issue is, Is there a way to tie the product for a purchase transaction in the multi-value "products" column to it's corresponding price in the multi-value "product_prices" column?

Everything seems to work except when I try to so something like listing the products by price for the multi-value fields like this:

source="test_sales.csv"
| makemv delim="," products
| makemv delim="," product_prices
| stats count(products) by products,product_prices

In the above results you can see that I'm getting results that are not exactly what I would want.

Ex. it shows:

3 instances of product_100 at a price of 100, should only be 2 instances

2 instances of product_100 at a price of 200, should be 0 instances of this combination

2 instances of product_200 at a price of 100, should be 0 instances of this combination

2 instances of product_200 at a price of 200, should only be 1 instance

I'm likely approaching this incorrectly or using the wrong tool for the task, any help to get me on the right track would be appreciated.

Thanks

PickleRick · ‎01-21-2024

Multivalued fields are separate entities which means Splunk doesn't keep any "connection" between values in those fields. For Splunk each field is just a single "multivalued value" (yes, I know it sounds bad ;-)).

So you have to manually combine those values. One solution @ITWhisperer already showed but for me it's a bit "brute force". My idea of a more "splunky" approach to splitting those products and product_prices would be to do

| eval zipped=mvzip(products,product_prices,":")
| mvexpand zipped
| eval zipped=split(zipped,":")
| eval products=mvindex(zipped,0), product_prices=mvindex(zipped,1)

Then you can do your stats

ITWhisperer · ‎01-21-2024

| eval row=mvrange(0,mvcount(products))
| mvexpand row
| eval products=mvindex(products, row)
| eval product_prices=mvindex(product_prices, row)
| stats count(products) by products,product_prices

digital_alchemy · ‎01-21-2024

@PickleRick @ITWhisperer Thanks for the responses, I tried both and they both give me the same result, but still not exactly what I had in mind in my head.

Here's the result:

What I'm trying to do is more along the lines of being able to get a count of products that sold at specific price points from transactions that may have multiple items purchased.

Maybe what I'm trying to do isn't really possible or the best approach to the problem, but the following table shows kinda what I'm trying to accomplish.

products	product_prices	count(products)
product_100	100	2
product_200	200	1
product_300	300	1
product_400	400	1

ITWhisperer · ‎01-22-2024

Here is the result when I did it

Please share your search for when you didn't get the result you were expecting

PickleRick · ‎01-22-2024

Yep. My version works too

(of course if you do stats afterwards and not include the "zipped" field, it will vanish).

How could I correlate values in multiple fields that contain multiple values

count

fields

stats

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Are you a member of the Splunk Community?