Is MongoDB compacting of indexes to save space after data is deleted a built-in option in Splunk 9? Previous posts indicated it was not possible, and it appeared the reason was due to WiredTiger not being the standard storage engine under the hood. Now that WiredTiger is required in Splunk 9, is support for 'compact' a standard feature in Splunk 9?
Need an elegant/supported way to free up disk space after deleting data.
When you realize that something has flooded your indexes and filled up disk space, what's the supported option to get the individual events out of the index and reclaim the disk space without losing the rest of the data in the index that is wanted? emphasis on reclaiming/freeing the disk space. Is the only option to let the buckets age out to frozen?
Thanks.
WiredTiger is a KVstore engine. It has nothing to do with actual data in indexes.
I don't suppose there will be a "compact" feature for indexes at all. The indexes are supposed to expire and roll over so the space should get eventually reclaimed after some time.
And the delete command is not something that's supposed to be used lightly.
Makes sense. I thought the underlying storage engine for the indexes is MongoDB, which is why I was thinking there must be a way to reclaim space.
We can wait for data to roll off, but in our case we have a long retention period (1-2yr +), necessary for performing investigations over the past couple of years. Combined with hardened boxes that generate a lot of log data to begin with. The two set up a situation where you can get a lot of undesired noise in the indexes before you realize it, and then you're stuck with a storage issue for a couple years while you wait for it to roll off because there is other good data in the index that you do want to retain.
There must be a way to clean up what amounts to a "data spill" in your indexes while retaining the good data.
The documentation for the 'delete' command indicates it only removes the data from regular searches, leaves references to it in metadata searches, and never reclaims the space. Is it marking the locations for overwrite? Will it be used by other data as long as the bucket is still active?
Thanks.
I don't know about internal workings of tsidx files (here you'd have to ask some splunk internal developer about it). But the general idea is that buckets are append-only.
If something in the middle is "deleted", it's just marked as non-available but I wouldn't expect any space reclaiming mechanism. Even if it was possible, it's such a rare use case that implementing it is probably not worth the effort.
I suppose there might be a way to remove whole bucket if all data contained in it is marked as deleted (if you're brave enough you could just remove the "empty" bucket from list and hope that splunk will generate warnings and nothing more but I wouldn't recommend this approach on a production server).
That's one of the reasons why good data onboarding process is important.
BTW, notice that if you had already ingested noise data, you inadvertently consumed some license quota. Which means that perhaps you could have had lower license level. So be careful about your ingested data.