Splunk Search

I need help joining two cvs (customers and purchases) that shows what was purchased and if certain products were not purchased

cromm
Explorer

The two csv files I have are customers (fields= customerName,customerID,region,IsActive) with one row per customer and the second is Purchases (fields=customerName, customerID, productName, purchaseDate) with one row per purchase. I am trying to create a list of all the active customers in Northwest region and when they first purchased 5 specific products (the purchases csv include 100+ different products but I only care about these 5). Customers could have bought 0, 1, 2, 3, 4 or all 5 of the target products but after the join I want the customer record to show up five times (once for each product) regardless of it was purchased.

I was thinking of duplicating each customer in the Northwest 4 times and adding a product field with the 5 products (1 per row for each customer) but I don't know how to do that or if it makes sense.

currently this is what I have:
| from inputlookup:"customer.csv"
| where (Region="NW") AND (IsActive="True")
|join customerID type=left max=0
[|inputlookup Purchases.csv
|where (productName="Train" OR productName="Car" OR productName="Truck" OR productName="Bike" OR productName="Scooter")]

This gets me (Tom bought 2 products, Jill bought all 5 and Ken bought none):
customerName,customerID,region,IsActive,customerName, customerID, productName, purchaseDate
Tom,1,NW,True,Tom,1,Train,7/25/12
Tom,1,NW,True,Tom,1,Car,7/2/12
Jill,2,NW,True,Jill,2,Train,5/3/13
Jill,2,NW,True,Jill,2,Car,7/12/15
Jill,2,NW,True,Jill,2,Truck,8/9/10
Jill,2,NW,True,Jill,2,Bike,1/2/13
Jill,2,NW,True,Jill,2,Scooter,4/5/7

Ken,3,NW,True,Ken,3,,,

What I want is (all customers to show up 5 times so Tom would have 3 rows without a purchase date, Jill would stay the same and Ken would have 5 rows without dates):
Tom,1,NW,True,Tom,1,Train,7/25/12
Tom,1,NW,True,Tom,1,Car,7/2/12
Tom,1,NW,True,Tom,1,Truck,
Tom,1,NW,True,Tom,1,Bike,
Tom,1,NW,True,Tom,1,Scooter,
Jill,2,NW,True,Jill,2,Train,5/3/13
Jill,2,NW,True,Jill,2,Car,7/12/15
Jill,2,NW,True,Jill,2,Truck,8/9/10
Jill,2,NW,True,Jill,2,Bike,1/2/13
Jill,2,NW,True,Jill,2,Scooter,4/5/7
Ken,3,NW,True,Ken,3,Train,
Ken,3,NW,True,Ken,3,Car,
Ken,3,NW,True,Ken,3,Truck,
Ken,3,NW,True,Ken,3,Bike,
Ken,3,NW,True,Ken,3,Scooter,

0 Karma
1 Solution

somesoni2
Revered Legend

Give this a try

| from inputlookup:"customer.csv" 
| where (Region="NW") AND (IsActive="True")
| eval productName=split("Train##Car##Truck##Bike##Scooter","##") 
| mvexpand productName
| lookup Purchases.csv customerID productName OUTPUT purchaseDate

View solution in original post

somesoni2
Revered Legend

Give this a try

| from inputlookup:"customer.csv" 
| where (Region="NW") AND (IsActive="True")
| eval productName=split("Train##Car##Truck##Bike##Scooter","##") 
| mvexpand productName
| lookup Purchases.csv customerID productName OUTPUT purchaseDate

cromm
Explorer

Thanks!! This is almost perfect. There are some customers who bought two different versions of the same product and I only want most recent version. I tried to remove it with:

|eval CustProd= customerName+productName
|dedup UserCert sortby -purchaseDate

but it didn't work. Do you have any ideas of how to get rid of the extra values?

0 Karma
Get Updates on the Splunk Community!

Improve Your Security Posture

Watch NowImprove Your Security PostureCustomers are at the center of everything we do at Splunk and security ...

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...