Getting Data In

Will Splunk automatically remove duplicate data based on index time?

k_harini
Communicator

Hi,

I have to monitor the folder which has 1 time historic data in place. Now from another system we get the csv files sftp'd to Splunk instance every 15 mins only for updates and new inserts. I have dedup in place for the queries. So for any updates, will the old duplicate be removed automatically based on index time? Or should i have to incorporate anything specific to remove old records? For inserts I guess it should work fine. Experts, Please guide me on this

0 Karma
1 Solution

niketn
Legend

Dedup should give you most recent single record. However, this is an expensive command.

Alternatively you can do the same by running stats on the data and pull latest(_time) latest(yourdatafield) and latest(satusfield) etc as per your need since dedup will be more expensive.

Splunk has a delete command with which you can render older data unsearchable, however, be careful whether that is exactly what you require. Refer to Splunk documentation: https://docs.splunk.com/Documentation/Splunk/6.5.1/SearchReference/Delete

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

niketn
Legend

Dedup should give you most recent single record. However, this is an expensive command.

Alternatively you can do the same by running stats on the data and pull latest(_time) latest(yourdatafield) and latest(satusfield) etc as per your need since dedup will be more expensive.

Splunk has a delete command with which you can render older data unsearchable, however, be careful whether that is exactly what you require. Refer to Splunk documentation: https://docs.splunk.com/Documentation/Splunk/6.5.1/SearchReference/Delete

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

k_harini
Communicator

ok thanks a lot.. I will check on the delete functionality..

0 Karma

skoelpin
SplunkTrust
SplunkTrust

So are you asking if Splunk has the ability to detect if the data has already been indexed from a log file already and ONLY index new data?

If so then yes, Splunk will only forward new data which has NOT been indexed already without you having to run a dedup command. An example would be

You have a log file that your monitoring.. That log file is currently 100MB and a forwarder has forwarded that 100MB of data already. Now a flurry of calls came in and that file grew to 110MB. The forwarder will only forward that new 10MB of data and recognize that the 100MB has already been forwarded and ignore it

0 Karma

k_harini
Communicator

Actually for old records there might be some updates to it.. in the sense -- status might be changed from open to closed.. In that case old record should be deleted and only new should be retained.. For any new record inserted into database the record should be indexed.. I'm concerned about first case where old record should be removed.. I want to know if dedup will remove duplicate records based on index time..

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...