Hi,
I have to monitor the folder which has 1 time historic data in place. Now from another system we get the csv files sftp'd to Splunk instance every 15 mins only for updates and new inserts. I have dedup
in place for the queries. So for any updates, will the old duplicate be removed automatically based on index time? Or should i have to incorporate anything specific to remove old records? For inserts I guess it should work fine. Experts, Please guide me on this
Dedup should give you most recent single record. However, this is an expensive command.
Alternatively you can do the same by running stats on the data and pull latest(_time) latest(yourdatafield) and latest(satusfield) etc as per your need since dedup will be more expensive.
Splunk has a delete command with which you can render older data unsearchable, however, be careful whether that is exactly what you require. Refer to Splunk documentation: https://docs.splunk.com/Documentation/Splunk/6.5.1/SearchReference/Delete
Dedup should give you most recent single record. However, this is an expensive command.
Alternatively you can do the same by running stats on the data and pull latest(_time) latest(yourdatafield) and latest(satusfield) etc as per your need since dedup will be more expensive.
Splunk has a delete command with which you can render older data unsearchable, however, be careful whether that is exactly what you require. Refer to Splunk documentation: https://docs.splunk.com/Documentation/Splunk/6.5.1/SearchReference/Delete
ok thanks a lot.. I will check on the delete functionality..
So are you asking if Splunk has the ability to detect if the data has already been indexed from a log file already and ONLY index new data?
If so then yes, Splunk will only forward new data which has NOT been indexed already without you having to run a dedup command. An example would be
You have a log file that your monitoring.. That log file is currently 100MB and a forwarder has forwarded that 100MB of data already. Now a flurry of calls came in and that file grew to 110MB. The forwarder will only forward that new 10MB of data and recognize that the 100MB has already been forwarded and ignore it
Actually for old records there might be some updates to it.. in the sense -- status might be changed from open to closed.. In that case old record should be deleted and only new should be retained.. For any new record inserted into database the record should be indexed.. I'm concerned about first case where old record should be removed.. I want to know if dedup will remove duplicate records based on index time..