Splunk Search

I need help joining two cvs (customers and purchases) that shows what was purchased and if certain products were not purchased

cromm
Explorer

The two csv files I have are customers (fields= customerName,customerID,region,IsActive) with one row per customer and the second is Purchases (fields=customerName, customerID, productName, purchaseDate) with one row per purchase. I am trying to create a list of all the active customers in Northwest region and when they first purchased 5 specific products (the purchases csv include 100+ different products but I only care about these 5). Customers could have bought 0, 1, 2, 3, 4 or all 5 of the target products but after the join I want the customer record to show up five times (once for each product) regardless of it was purchased.

I was thinking of duplicating each customer in the Northwest 4 times and adding a product field with the 5 products (1 per row for each customer) but I don't know how to do that or if it makes sense.

currently this is what I have:
| from inputlookup:"customer.csv"
| where (Region="NW") AND (IsActive="True")
|join customerID type=left max=0
[|inputlookup Purchases.csv
|where (productName="Train" OR productName="Car" OR productName="Truck" OR productName="Bike" OR productName="Scooter")]

This gets me (Tom bought 2 products, Jill bought all 5 and Ken bought none):
customerName,customerID,region,IsActive,customerName, customerID, productName, purchaseDate
Tom,1,NW,True,Tom,1,Train,7/25/12
Tom,1,NW,True,Tom,1,Car,7/2/12
Jill,2,NW,True,Jill,2,Train,5/3/13
Jill,2,NW,True,Jill,2,Car,7/12/15
Jill,2,NW,True,Jill,2,Truck,8/9/10
Jill,2,NW,True,Jill,2,Bike,1/2/13
Jill,2,NW,True,Jill,2,Scooter,4/5/7

Ken,3,NW,True,Ken,3,,,

What I want is (all customers to show up 5 times so Tom would have 3 rows without a purchase date, Jill would stay the same and Ken would have 5 rows without dates):
Tom,1,NW,True,Tom,1,Train,7/25/12
Tom,1,NW,True,Tom,1,Car,7/2/12
Tom,1,NW,True,Tom,1,Truck,
Tom,1,NW,True,Tom,1,Bike,
Tom,1,NW,True,Tom,1,Scooter,
Jill,2,NW,True,Jill,2,Train,5/3/13
Jill,2,NW,True,Jill,2,Car,7/12/15
Jill,2,NW,True,Jill,2,Truck,8/9/10
Jill,2,NW,True,Jill,2,Bike,1/2/13
Jill,2,NW,True,Jill,2,Scooter,4/5/7
Ken,3,NW,True,Ken,3,Train,
Ken,3,NW,True,Ken,3,Car,
Ken,3,NW,True,Ken,3,Truck,
Ken,3,NW,True,Ken,3,Bike,
Ken,3,NW,True,Ken,3,Scooter,

0 Karma
1 Solution

somesoni2
Revered Legend

Give this a try

| from inputlookup:"customer.csv" 
| where (Region="NW") AND (IsActive="True")
| eval productName=split("Train##Car##Truck##Bike##Scooter","##") 
| mvexpand productName
| lookup Purchases.csv customerID productName OUTPUT purchaseDate

View solution in original post

somesoni2
Revered Legend

Give this a try

| from inputlookup:"customer.csv" 
| where (Region="NW") AND (IsActive="True")
| eval productName=split("Train##Car##Truck##Bike##Scooter","##") 
| mvexpand productName
| lookup Purchases.csv customerID productName OUTPUT purchaseDate

cromm
Explorer

Thanks!! This is almost perfect. There are some customers who bought two different versions of the same product and I only want most recent version. I tried to remove it with:

|eval CustProd= customerName+productName
|dedup UserCert sortby -purchaseDate

but it didn't work. Do you have any ideas of how to get rid of the extra values?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...