Splunk Search

Removing duplicate entries considering multiple fields

k_harini
Communicator

Hi,

I have a file containing 1000 records. There are multiple entries for each of the fields Eg- camp_label, del_code, Rec_ID, Event time etc.
Even the time stamp has multiple entries. say in total 1000 records
When the same file with additional records is added again the next day(1000+500 new entries) , I do not want the duplicate entries from previous file. Presently index has duplicate records as well (2500 records). How can i eliminate the duplicates in efficient manner. Will dedup with multiple fields work (like composite key)? Or is there any other better way? please suggest

Tags (1)
0 Karma

woodcock
Esteemed Legend

Assuming this is *NIX and you are monitoring file file.txt and the file is updated once a day:

Change inputs.conf to look for file.new instead of file.txt.
Add this cron job to hit after the daily update (which I exemplified happening @ 2AM, so I picked 3AM):

00 3 * * * /bin/diff file.prev file.txt | /bin/grep "^+" | /bin/awk  'NR>1' | /bin/sed "s/^+//" > file.new && /bin/mv file.txt file.prev

The first time that you set this up, do this:

mv file.txt file.prev
0 Karma

somesoni2
Revered Legend

The dedup can work with multiple/composite fields. If the data for today contains all the records for yesterday then why don't you just take the latest day's data so that there are no duplicates.\?

0 Karma

k_harini
Communicator

in continuous monitoring mode, how to take only latest days data. And not always all the records will repeat. If repeats discard those and consider only the new ones
Events time stamps are not unique

0 Karma
Get Updates on the Splunk Community!

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

🔐 Trust at Every Hop: How mTLS in Splunk Enterprise 10.0 Makes Security Simpler

From Idea to Implementation: Why Splunk Built mTLS into Splunk Enterprise 10.0  mTLS wasn’t just a checkbox ...