Splunk Search

Search - Merge Fields to Dedup

gc12345
Engager

Hi 

I wish to dedup and consolidate customer details across two cities.

E.g.  I have 2 records of the same customer across two cities and I want to consolidate them into 1 row

NewCustomerIDCity1_CustomerIDCity2_CustomerIDCityisActiveCustomer
1234500001 AY
12345 00002BN

 

Result: 

* Merge City1_CustomerID and City2_CustomerID into one row

* City Field populates the City where the customer is active

* Field for ExistsInOtherCities is "Y" when there are 2 or more records.

 

NewCustomerIDCity1_CustomerIDCity2_CustomerIDCityisActiveCustomerExistsInOtherCities
123450000100002AYY

 

I am used to SQL where I can make temp tables in each in then join it back but unsure how to do it in SPLUNK

Labels (2)
Tags (1)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust
| eval activeCity=if(isActiveCustomer=="Y",City,null)
| stats list(City_CustomerID) as City_CustomerID first(activeCity) as activeCity dc(City) as City_count by NewCustomerID
| eval City1_CustomerID=mvindex(City_CustomerID,0)
| eval City2_CustomerID=mvindex(City_CustomerID,1)
| eval City=activeCity
| eval isActiveCustomer=if(isnotnull(City),"Y","N")
| eval ExistsInOtherCities=if(City_count>1,"Y","N")

View solution in original post

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

What do your events look like? For example, do events from different cities use different field names, and what does it look like if there are more than 2 cities? Can only one city be active for the customer? Is the new customer id a different field to the city customer id?

0 Karma

gc12345
Engager

- They use the same field names (The Data Admin provided me a lookup to use rather than streaming events)

- It cannot be more than 2 cities 

- Both cities can be active but choose the first record

- The new Customer ID can sometimes be the same as the city customer id 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| eval activeCity=if(isActiveCustomer=="Y",City,null)
| stats list(City_CustomerID) as City_CustomerID first(activeCity) as activeCity dc(City) as City_count by NewCustomerID
| eval City1_CustomerID=mvindex(City_CustomerID,0)
| eval City2_CustomerID=mvindex(City_CustomerID,1)
| eval City=activeCity
| eval isActiveCustomer=if(isnotnull(City),"Y","N")
| eval ExistsInOtherCities=if(City_count>1,"Y","N")

View solution in original post

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!