I have an index (few million rows) that I need to delete and re-index the new data every night from a DB input. The data doesn't support a great way for me to use a rising column (or I would) and the team that use the DB are back-dating data in there too, which makes it now fun to search for updates.
Today I'm using the "|delete" in a scheduled search for that index, then running the db connect input a few minutes after that timeframe. Is this the best way to do this? Looking for any suggestions on how best to completely remove the data in an index, and reload. No other automation tools available to me at this time (puppet, chef, etc).
Thanks!
Joe
@joesrepsolc if you need to refresh your DB data pulled into Splunk, best way would have been to use dbxquery to fetch data and update a KV Store and accelerate/replicate to indexer
the same to support querying through million rows. (PS: Also should not have too many columns)
delete would be a bad way to remove searchable data in an index as it still occupies space on your indexers.
. Refer to doc: https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Delete
Also, indexing million records every day and purging the same seems a bad use of license.
Check out one of older answers for details on KV Store Acceleration and Index Replication of KV Store: https://answers.splunk.com/answers/432770/scaling-kv-store-performance.html
Thanks for the response niketnilay...
I am aware that its not a good practice, and it does not clear up the actual space in the index... uses more licensing everyday that I reload the complete dataset... but we couldn't come up with another way (yet).
I've not used the KV store solution to date, but need to know more about it. Unsure that it is a good fit for this volume of records either. Trying to read more about the limitations, maximums, use cases for KV store before going to that solution. Not feeling that it will do much for me at this time.