Splunk Search

How to delete duplicates from Lookup csv file ?

neerajs_81
Builder

Hello,  We have a CSV Lookup file that is getting populated by a saved search.  We are noticing there are lot of duplicate rows getting created every other day.   The file doesn't open in Lookup Editor App as its size is >  10MB.    Can someone pls advise how to delete duplicates via a query ?


Labels (1)

ITWhisperer
SplunkTrust
SplunkTrust

Change the saved search or post-process the saved search to remove duplicates before writing the csv.

There a number of ways to remove duplicates depending on your criteria. For example, when there is a "duplicate", is it completely duplicated across all fields or a subset? If it is a subset, which version takes priority, e.g. first, last, max, min, etc.? If it is not a subset, is the order in anyway significant (unlikely if being used as a lookup but worth considering anyway)?

neerajs_81
Builder

I have actually updated the problem scenario in another post and tagged you in it.  Just' realized its not really duplicates but results getting appended to data in previous row. Pls see below. Can you help ?

https://community.splunk.com/t5/Splunk-Search/How-to-make-a-Search-NOT-append-results-from-previous-...

Tags (1)
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Index This | What travels the world but is also stuck in place?

April 2026 Edition  Hayyy Splunk Education Enthusiasts and the Eternally Curious!   We’re back with this ...

Discover New Use Cases: Unlock Greater Value from Your Existing Splunk Data

Realizing the full potential of your Splunk investment requires more than just understanding current usage; it ...

Continue Your Journey: Join Session 2 of the Data Management and Federation Bootcamp ...

As data volumes continue to grow and environments become more distributed, managing and optimizing data ...