Getting Data In

Will Splunk automatically remove duplicate data based on index time?

k_harini
Communicator

Hi,

I have to monitor the folder which has 1 time historic data in place. Now from another system we get the csv files sftp'd to Splunk instance every 15 mins only for updates and new inserts. I have dedup in place for the queries. So for any updates, will the old duplicate be removed automatically based on index time? Or should i have to incorporate anything specific to remove old records? For inserts I guess it should work fine. Experts, Please guide me on this

0 Karma
1 Solution

niketn
Legend

Dedup should give you most recent single record. However, this is an expensive command.

Alternatively you can do the same by running stats on the data and pull latest(_time) latest(yourdatafield) and latest(satusfield) etc as per your need since dedup will be more expensive.

Splunk has a delete command with which you can render older data unsearchable, however, be careful whether that is exactly what you require. Refer to Splunk documentation: https://docs.splunk.com/Documentation/Splunk/6.5.1/SearchReference/Delete

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

niketn
Legend

Dedup should give you most recent single record. However, this is an expensive command.

Alternatively you can do the same by running stats on the data and pull latest(_time) latest(yourdatafield) and latest(satusfield) etc as per your need since dedup will be more expensive.

Splunk has a delete command with which you can render older data unsearchable, however, be careful whether that is exactly what you require. Refer to Splunk documentation: https://docs.splunk.com/Documentation/Splunk/6.5.1/SearchReference/Delete

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

k_harini
Communicator

ok thanks a lot.. I will check on the delete functionality..

0 Karma

skoelpin
SplunkTrust
SplunkTrust

So are you asking if Splunk has the ability to detect if the data has already been indexed from a log file already and ONLY index new data?

If so then yes, Splunk will only forward new data which has NOT been indexed already without you having to run a dedup command. An example would be

You have a log file that your monitoring.. That log file is currently 100MB and a forwarder has forwarded that 100MB of data already. Now a flurry of calls came in and that file grew to 110MB. The forwarder will only forward that new 10MB of data and recognize that the 100MB has already been forwarded and ignore it

0 Karma

k_harini
Communicator

Actually for old records there might be some updates to it.. in the sense -- status might be changed from open to closed.. In that case old record should be deleted and only new should be retained.. For any new record inserted into database the record should be indexed.. I'm concerned about first case where old record should be removed.. I want to know if dedup will remove duplicate records based on index time..

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...