Solved: Will Splunk automatically remove duplicate data ba...

k_harini · ‎12-02-2016

Hi,

I have to monitor the folder which has 1 time historic data in place. Now from another system we get the csv files sftp'd to Splunk instance every 15 mins only for updates and new inserts. I have dedup in place for the queries. So for any updates, will the old duplicate be removed automatically based on index time? Or should i have to incorporate anything specific to remove old records? For inserts I guess it should work fine. Experts, Please guide me on this

niketn · ‎12-02-2016

Dedup should give you most recent single record. However, this is an expensive command.

Alternatively you can do the same by running stats on the data and pull latest(_time) latest(yourdatafield) and latest(satusfield) etc as per your need since dedup will be more expensive.

Splunk has a delete command with which you can render older data unsearchable, however, be careful whether that is exactly what you require. Refer to Splunk documentation: https://docs.splunk.com/Documentation/Splunk/6.5.1/SearchReference/Delete

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

niketn · ‎12-02-2016

Dedup should give you most recent single record. However, this is an expensive command.

Alternatively you can do the same by running stats on the data and pull latest(_time) latest(yourdatafield) and latest(satusfield) etc as per your need since dedup will be more expensive.

Splunk has a delete command with which you can render older data unsearchable, however, be careful whether that is exactly what you require. Refer to Splunk documentation: https://docs.splunk.com/Documentation/Splunk/6.5.1/SearchReference/Delete

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

k_harini · ‎12-02-2016

ok thanks a lot.. I will check on the delete functionality..

skoelpin · ‎12-02-2016

So are you asking if Splunk has the ability to detect if the data has already been indexed from a log file already and ONLY index new data?

If so then yes, Splunk will only forward new data which has NOT been indexed already without you having to run a dedup command. An example would be

You have a log file that your monitoring.. That log file is currently 100MB and a forwarder has forwarded that 100MB of data already. Now a flurry of calls came in and that file grew to 110MB. The forwarder will only forward that new 10MB of data and recognize that the 100MB has already been forwarded and ignore it

k_harini · ‎12-02-2016

Actually for old records there might be some updates to it.. in the sense -- status might be changed from open to closed.. In that case old record should be deleted and only new should be retained.. For any new record inserted into database the record should be indexed.. I'm concerned about first case where old record should be removed.. I want to know if dedup will remove duplicate records based on index time..

Will Splunk automatically remove duplicate data based on index time?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Data Management Digest – May 2026

Join the Conversation