So, I've been away from Splunk for several years now, and now re-visiting it. I've got a scenario where I would like to track certain metrics from imported data. I created a simple CSV with just a few entries to demonstrate the issues I'm having.
Below is the source data I created:
customer_id | Time | customer_fname | customer_lname | products | product_prices |
111 | 12/1/2023 | John | Doe | product_100,product_200 | 100,200 |
222 | 12/11/2023 | Suzy | Que | product_100 | 100 |
333 | 12/15/2023 | Jack | Jones | product_300 | 300 |
111 | 12/18/2023 | John | Doe | product_400 | 400 |
In this scenario this is just examples of customers and the items they purchased and the price paid.
After uploading the file and displaying the data in a table it looks as expected:
source="test_sales.csv"
| table customer_id,customer_fname,customer_lname,products,product_prices
Upon using makemv to convert "products" and "product_prices" to multi-value fields, again the results are as expected and the product and price align since they were input into the source CSV in the proper order:
source="test_sales.csv"
| makemv delim="," products
| makemv delim="," product_prices
| table customer_id,customer_fname,customer_lname,products,product_prices
Here is where my issue is, Is there a way to tie the product for a purchase transaction in the multi-value "products" column to it's corresponding price in the multi-value "product_prices" column?
Everything seems to work except when I try to so something like listing the products by price for the multi-value fields like this:
source="test_sales.csv"
| makemv delim="," products
| makemv delim="," product_prices
| stats count(products) by products,product_prices
In the above results you can see that I'm getting results that are not exactly what I would want.
Ex. it shows:
3 instances of product_100 at a price of 100, should only be 2 instances
2 instances of product_100 at a price of 200, should be 0 instances of this combination
2 instances of product_200 at a price of 100, should be 0 instances of this combination
2 instances of product_200 at a price of 200, should only be 1 instance
I'm likely approaching this incorrectly or using the wrong tool for the task, any help to get me on the right track would be appreciated.
Thanks
Multivalued fields are separate entities which means Splunk doesn't keep any "connection" between values in those fields. For Splunk each field is just a single "multivalued value" (yes, I know it sounds bad ;-)).
So you have to manually combine those values. One solution @ITWhisperer already showed but for me it's a bit "brute force". My idea of a more "splunky" approach to splitting those products and product_prices would be to do
| eval zipped=mvzip(products,product_prices,":")
| mvexpand zipped
| eval zipped=split(zipped,":")
| eval products=mvindex(zipped,0), product_prices=mvindex(zipped,1)
Then you can do your stats
| eval row=mvrange(0,mvcount(products))
| mvexpand row
| eval products=mvindex(products, row)
| eval product_prices=mvindex(product_prices, row)
| stats count(products) by products,product_prices
@PickleRick @ITWhisperer Thanks for the responses, I tried both and they both give me the same result, but still not exactly what I had in mind in my head.
Here's the result:
What I'm trying to do is more along the lines of being able to get a count of products that sold at specific price points from transactions that may have multiple items purchased.
Maybe what I'm trying to do isn't really possible or the best approach to the problem, but the following table shows kinda what I'm trying to accomplish.
products | product_prices | count(products) |
product_100 | 100 | 2 |
product_200 | 200 | 1 |
product_300 | 300 | 1 |
product_400 | 400 | 1 |
Here is the result when I did it
Please share your search for when you didn't get the result you were expecting
Yep. My version works too
(of course if you do stats afterwards and not include the "zipped" field, it will vanish).