All Apps and Add-ons

Splunk DB Connect: How to create a batch input that only loads latest data in the index, removing data from previous runs?

Path Finder

Hello There,

I have recently started using the DB Connect App and I want to create a Batch Input with frequency of every 120 minutes.

I have set this up and now it is loading data in the index every 120 minutes, but is there any way with which I can just keep the latest data in the index and remove all the old data added by previous runs of this batch?

So, I want to override the data in index after every batch run and not to append the data.

Any help would be greatly appreciated.

Thank you.

Madhav

Labels (2)
0 Karma

SplunkTrust
SplunkTrust

That's not how Splunk works. Once data is indexed, it stays indexed until it expires.

Can you change the input mode to select only new data?

---
If this reply helps you, an upvote would be appreciated.
0 Karma

Path Finder

Thank you for the reply. Actually I do not have any date/time column which I can use for raising input. The data I want from DB is a Flag Column with values changing from Yes to No and No to Yes - so thought of using batch input. But in case I create an index with retention policy of 3 or 5 days, and that should help, I believe.
Thank you for your inputs.

0 Karma

SplunkTrust
SplunkTrust

Yes, a short retention period will help. You can also use dedup and other SPL commands to eliminate duplicate data from your search results.

---
If this reply helps you, an upvote would be appreciated.
0 Karma

Path Finder

Thanks, I have created a separate index with 5 days retention period. And yes, I can use dedup but just wanted to keep index as clear as possible, by removing redundant data. Thank you for your help on this.
If you can convert your comment to an answer, I can accept it. Thank you.

0 Karma

Splunk Employee
Splunk Employee

If you're interested, I wrote a dedup example for this exact scenario:
https://github.com/tmuth/splunking-oracle/blob/master/Duplicates/delete-duplicates.spl
to allow you to delete the duplicates. Keep in mind that Splunk doesn't really delete the duplicates, it just marks them as deleted, so you wouldn't save space on disk. The short retention period suggested by Rich solves that. This technique would allow you to search without using the dedup command.

0 Karma