Splunk Search

Search - Merge Fields to Dedup

gc12345
Engager

Hi 

I wish to dedup and consolidate customer details across two cities.

E.g.  I have 2 records of the same customer across two cities and I want to consolidate them into 1 row

NewCustomerIDCity1_CustomerIDCity2_CustomerIDCityisActiveCustomer
1234500001 AY
12345 00002BN

 

Result: 

* Merge City1_CustomerID and City2_CustomerID into one row

* City Field populates the City where the customer is active

* Field for ExistsInOtherCities is "Y" when there are 2 or more records.

 

NewCustomerIDCity1_CustomerIDCity2_CustomerIDCityisActiveCustomerExistsInOtherCities
123450000100002AYY

 

I am used to SQL where I can make temp tables in each in then join it back but unsure how to do it in SPLUNK

Labels (2)
Tags (1)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust
| eval activeCity=if(isActiveCustomer=="Y",City,null)
| stats list(City_CustomerID) as City_CustomerID first(activeCity) as activeCity dc(City) as City_count by NewCustomerID
| eval City1_CustomerID=mvindex(City_CustomerID,0)
| eval City2_CustomerID=mvindex(City_CustomerID,1)
| eval City=activeCity
| eval isActiveCustomer=if(isnotnull(City),"Y","N")
| eval ExistsInOtherCities=if(City_count>1,"Y","N")

View solution in original post

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

What do your events look like? For example, do events from different cities use different field names, and what does it look like if there are more than 2 cities? Can only one city be active for the customer? Is the new customer id a different field to the city customer id?

0 Karma

gc12345
Engager

- They use the same field names (The Data Admin provided me a lookup to use rather than streaming events)

- It cannot be more than 2 cities 

- Both cities can be active but choose the first record

- The new Customer ID can sometimes be the same as the city customer id 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
| eval activeCity=if(isActiveCustomer=="Y",City,null)
| stats list(City_CustomerID) as City_CustomerID first(activeCity) as activeCity dc(City) as City_count by NewCustomerID
| eval City1_CustomerID=mvindex(City_CustomerID,0)
| eval City2_CustomerID=mvindex(City_CustomerID,1)
| eval City=activeCity
| eval isActiveCustomer=if(isnotnull(City),"Y","N")
| eval ExistsInOtherCities=if(City_count>1,"Y","N")
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Kick the Tires Before You Commit: A Hands-On Tour of the Splunk Observability Cloud ...

Evaluating an enterprise observability platform usually goes like this: fill out a form, get a free trial with ...

Deep insights, no barriers: Splunk Observability Cloud Free Edition

As software delivery cycles continue to accelerate, observability shouldn’t be a luxury — it should be a ...

Monitoring AI Agents with Splunk Observability Cloud

Let’s say I’m running a travel planning AI app in production. A user asks for three concise hotel options in ...