How do I reclaim my disk space after deleting a large number of events from an index?
The Remove data from Splunk pages says:
Currently, piping to
delete
does not reclaim disk space, but Splunk will be delivering a utility in a future release that reclaims the disk space--this will go through and permanently remove all the events marked by the delete operator.
Is there any other way of reclaiming this space in the meantime?
It is possible to reclaim disk space in this type of scenario by re-indexing the effected buckets.
Note: This may also be useful if you've deleted some sensitive information, such as a password, that really needs to be completely purged. This approach would prevent that indexed term from showing up in type-a-head, for example.
There are several steps to this process.
delete
command forces a bucket roll for hot buckets.).csv
fileFor users running on a unix platform, the following shell commands (script) may be of use: (Note that we are combining the export and import step into a single operation using a pipe)
#!/bin/bash
BUCKET=$1
# Be sure to compare the imported/exported event count. They should be the same.
exporttool ${BUCKET} /dev/stdout -csv meta::all | importtool ${BUCKET}.new /dev/stdin
# Make sure that bucket .tsidx files are optimized (and merged_lexicon.lex is up to date)
splunk-optimize ${BUCKET}.new
splunk-optimize-lex ${BUCKET}.new
# Compress all rawdata files that were not gziped by importtool
find ${BUCKET}.new/rawdata -name '[0-9]*[0-9]' -size +1k -print0 | xargs -0 -r gzip -v9
# Swap buckets
mv ${BUCKET} ${BUCKET}.old
mv ${BUCKET}.new ${BUCKET}
# Uncomment next line if you really want to remove the original bucket automatically
# rm -rf ${BUCKET}.old
Note: If you plan on using this script, please be sure to add return-code checking. You wouldn't want to remove the original bucket if the export/import failed to complete, for example.
Other considerations:
importtool
does not respect your segmentation settings. The default segmentation is used for all imported events. For many setups, this will not matter, but it is something to be aware of.Post dates on this are 2010 -- anyone know if they ever came up with that tool? (to reclaim space...)
I have 4.4 billion events -- an export/clean/import would be way ugly... 😉
As of December 2011, Splunk 4.2.5 still does not provide this functionality. The docs still say "Note: Piping to delete does not reclaim disk space.". I heard this is still on the roadmap, but it's still not available.
not sure what you want to do exactly, but if deleting most of an index for which the logs are still around, you'd prob be better off deleting the index and reindexing the events that you want to
$SPLUNK_HOME/bin/splunk stop
$SPLUNK_HOME/bin/splunk clean eventdata -index myindex
$SPLUNK_HOME/bin/splunk start
Yes, the link to the docs in the question does mention that option too. If you want to delete almost everything in an index, then sure this would work. But this is NOT something you would want to do after running splunk for any considerable length of time. Also remember that re-indexing the log files would count towards your license usage. And you also have to use tricks to get splunk to re-read the log files you want to keep.
It is possible to reclaim disk space in this type of scenario by re-indexing the effected buckets.
Note: This may also be useful if you've deleted some sensitive information, such as a password, that really needs to be completely purged. This approach would prevent that indexed term from showing up in type-a-head, for example.
There are several steps to this process.
delete
command forces a bucket roll for hot buckets.).csv
fileFor users running on a unix platform, the following shell commands (script) may be of use: (Note that we are combining the export and import step into a single operation using a pipe)
#!/bin/bash
BUCKET=$1
# Be sure to compare the imported/exported event count. They should be the same.
exporttool ${BUCKET} /dev/stdout -csv meta::all | importtool ${BUCKET}.new /dev/stdin
# Make sure that bucket .tsidx files are optimized (and merged_lexicon.lex is up to date)
splunk-optimize ${BUCKET}.new
splunk-optimize-lex ${BUCKET}.new
# Compress all rawdata files that were not gziped by importtool
find ${BUCKET}.new/rawdata -name '[0-9]*[0-9]' -size +1k -print0 | xargs -0 -r gzip -v9
# Swap buckets
mv ${BUCKET} ${BUCKET}.old
mv ${BUCKET}.new ${BUCKET}
# Uncomment next line if you really want to remove the original bucket automatically
# rm -rf ${BUCKET}.old
Note: If you plan on using this script, please be sure to add return-code checking. You wouldn't want to remove the original bucket if the export/import failed to complete, for example.
Other considerations:
importtool
does not respect your segmentation settings. The default segmentation is used for all imported events. For many setups, this will not matter, but it is something to be aware of.