Is it possible to thaw out more than one bucket at once? Or do you have to do a rebuild for each, one by one?
I have to thaw out months and months worth of data - something like hundreds of buckets. I'd hate to have to rebuild each one at a time.
The following script will allow you to specify which indexes to rebuild (using globbing/wildcards) and will also let you specify a time threshold in case you don't want to rebuild indexes for every bucket in your thaweddb directory. In addition, it runs the rebuilds in multiple concurrent background jobs (child processes) so you can rebuild multiple buckets CONCURRENTLY which is what most people who land here are after (this speeds things up as long as you have no resource constraints). You can also restart this script if you need to, it checks for an index before executing the rebuild for any given bucket so it will not hurt anything to re-run it. Cheers and happy Splunking!
#!/usr/bin/env bash
# initialize variables
# --------------------
# Set to the number of buckets you want to rebuild concurrently
_maxChildProcs=10
# Set to the same path as $SPLUNK_HOME ($SPLUNK_HOME/bin/splunk should be a valid path to the Splunk CLI)
_splunkHome="/opt/splunk"
# You must use a wildcard for the index name if you wish to restore buckets for multiple indexes
_thawedDBPath="/opt/splunk/var/lib/splunk/*/thaweddb"
# Set the earliest time (epoch) for buckets you would like to restore, nothing that occurs previous to this time will be rebuilt
# Note: if set to null no earliest time will be enforced
_earliestTime=
# Set the latest time (epoch) for bucket you would like to restore, nothing that occurs after this time will be rebuilt
# Note: if set to null no latest time will be enforced
_latestTime=
if [ -z "$_earliestTime" ] && [ -z "$_latestTime" ]; then
# Rebuild indexes for all thawed buckets
_bucketListCmd="ls -d ${_thawedDBPath}/*"
elif [ -n "$_earliestTime" ] && [ -n "$_latestTime" ]; then
# Rebuild indexes for all thawed buckets after _earliestTime and before _latestTime
_bucketListCmd="ls -d ${_thawedDBPath}/* | awk -v et=$_earliestTime, lt=$_latestTime 'BEGIN {FS = \"_\"} $2 >= et && $3 <= lt {print $1\"_\"$2\"_\"$3\"_\"$4}'"
elif [ -n "$_earliestTime" ]; then
# Rebuild indexes for all thawed buckets that occur after earliestTime
_bucketListCmd="ls -d ${_thawedDBPath}/* | awk -v et=$_earliestTime 'BEGIN {FS = \"_\"} $2 >= et {print $1\"_\"$2\"_\"$3\"_\"$4}'"
elif [ -n "$_latestTime" ]; then
# Rebuild indexes for all thawed buckets that occur before latestTime
_bucketListCmd="ls -d ${_thawedDBPath}/* | awk -v lt=$_latestTime 'BEGIN {FS = \"_\"} $3 <= lt {print $1\"_\"$2\"_\"$3\"_\"$4}'"
fi
# Spawn child processes to perform index restore for specified thawed buckets
# ---------------------------------------------------------------------------
while read -r _thawedBucket; do
# Rebuld bucket if it hasn't been rebuilt already
if ls $_thawedBucket/*.tsidx 1> /dev/null 2>&1; then
echo "Already rebuilt: $_thawedBucket"
else
if [ $_maxChildProcs -gt 1 ]; then
echo "Rebuilding bucket: $_thawedBucket..."
${_splunkHome}/bin/splunk rebuild $_thawedBucket &
else
echo "Rebuilding bucket: $_thawedBucket..."
${_splunkHome}/bin/splunk rebuild $_thawedBucket
fi
fi
# throttle child process count
# ----------------------------
_childProcCount=$(ps -ef | awk '{print $3}' | grep "$$" | awk 'END {print FNR}')
while [ "$_childProcCount" -gt "$_maxChildProcs" ]; do
sleep 0.5
_childProcCount=$(ps -ef | awk '{print $3}' | grep "$$" | awk 'END {print FNR}')
done
done < <($_bucketListCmd)
Tested example for one-shot uses to rebuild all thawed buckets for a single index, which also includes the index name as required by newer versions:
ls -dA /data/idx/hot/splunk/{index-name}/thaweddb/db_* | xargs -I BUCKET --max-procs=10 /opt/splunk/bin/splunk rebuild BUCKET {index-name}
stoomart, your script appears to be exactly what I need, when I run the script in a RHEL 9 box it immediately transitions to the fsck command with a single bucket option, then displays an error message "path is not extant". Your thoughts? Also please clarify where you show "{index-name}", are you using the brackets to indicate placeholders or are they to be used in the thawed bucket path and at the end?
Thank you in advance for your help!
The following script will allow you to specify which indexes to rebuild (using globbing/wildcards) and will also let you specify a time threshold in case you don't want to rebuild indexes for every bucket in your thaweddb directory. In addition, it runs the rebuilds in multiple concurrent background jobs (child processes) so you can rebuild multiple buckets CONCURRENTLY which is what most people who land here are after (this speeds things up as long as you have no resource constraints). You can also restart this script if you need to, it checks for an index before executing the rebuild for any given bucket so it will not hurt anything to re-run it. Cheers and happy Splunking!
#!/usr/bin/env bash
# initialize variables
# --------------------
# Set to the number of buckets you want to rebuild concurrently
_maxChildProcs=10
# Set to the same path as $SPLUNK_HOME ($SPLUNK_HOME/bin/splunk should be a valid path to the Splunk CLI)
_splunkHome="/opt/splunk"
# You must use a wildcard for the index name if you wish to restore buckets for multiple indexes
_thawedDBPath="/opt/splunk/var/lib/splunk/*/thaweddb"
# Set the earliest time (epoch) for buckets you would like to restore, nothing that occurs previous to this time will be rebuilt
# Note: if set to null no earliest time will be enforced
_earliestTime=
# Set the latest time (epoch) for bucket you would like to restore, nothing that occurs after this time will be rebuilt
# Note: if set to null no latest time will be enforced
_latestTime=
if [ -z "$_earliestTime" ] && [ -z "$_latestTime" ]; then
# Rebuild indexes for all thawed buckets
_bucketListCmd="ls -d ${_thawedDBPath}/*"
elif [ -n "$_earliestTime" ] && [ -n "$_latestTime" ]; then
# Rebuild indexes for all thawed buckets after _earliestTime and before _latestTime
_bucketListCmd="ls -d ${_thawedDBPath}/* | awk -v et=$_earliestTime, lt=$_latestTime 'BEGIN {FS = \"_\"} $2 >= et && $3 <= lt {print $1\"_\"$2\"_\"$3\"_\"$4}'"
elif [ -n "$_earliestTime" ]; then
# Rebuild indexes for all thawed buckets that occur after earliestTime
_bucketListCmd="ls -d ${_thawedDBPath}/* | awk -v et=$_earliestTime 'BEGIN {FS = \"_\"} $2 >= et {print $1\"_\"$2\"_\"$3\"_\"$4}'"
elif [ -n "$_latestTime" ]; then
# Rebuild indexes for all thawed buckets that occur before latestTime
_bucketListCmd="ls -d ${_thawedDBPath}/* | awk -v lt=$_latestTime 'BEGIN {FS = \"_\"} $3 <= lt {print $1\"_\"$2\"_\"$3\"_\"$4}'"
fi
# Spawn child processes to perform index restore for specified thawed buckets
# ---------------------------------------------------------------------------
while read -r _thawedBucket; do
# Rebuld bucket if it hasn't been rebuilt already
if ls $_thawedBucket/*.tsidx 1> /dev/null 2>&1; then
echo "Already rebuilt: $_thawedBucket"
else
if [ $_maxChildProcs -gt 1 ]; then
echo "Rebuilding bucket: $_thawedBucket..."
${_splunkHome}/bin/splunk rebuild $_thawedBucket &
else
echo "Rebuilding bucket: $_thawedBucket..."
${_splunkHome}/bin/splunk rebuild $_thawedBucket
fi
fi
# throttle child process count
# ----------------------------
_childProcCount=$(ps -ef | awk '{print $3}' | grep "$$" | awk 'END {print FNR}')
while [ "$_childProcCount" -gt "$_maxChildProcs" ]; do
sleep 0.5
_childProcCount=$(ps -ef | awk '{print $3}' | grep "$$" | awk 'END {print FNR}')
done
done < <($_bucketListCmd)
Thank you very much! After 8 years this script is still relevant and working correctly! Karma is given!
Hint: if you are executing this against a large quantity of data over SSH use no hangup and execute as a background job like the example below (this keeps it running if you lose your SSH session):
nohup [sudo] bash {name-of-script}.sh &
Do we know if this still works with version 7+?
Thanks! This looks good. I just got a request to recover some data so I'll test this next week.
We use a cluster so all of the directories end with the indexer id.
db_1268768214_1268768057_1670_ABCDEFAB-6666-EEEE-9999-AAAAAAAAAAAAA
They also have buckets that begin with rb_ for the replicated buckets.
rb_1329872628_1268768214_1670_ABCDEFAB-6666-EEEE-9999-AAAAAAAAAAAAA
I'll hard code $1 as db and add a $5 after the $4s.
Curious how this went and if you are able to post your version of the above for clusters? I really feel that Splunk should package an example script like this in the distribution (they do that type of thing for some other stuff already) but since they don't I was hoping to keep an up to date community solution available here.
So I think that you shouldn't have to restore the RB but you'll get errors if you don't. (I think that's a bug but I'm not sure and since I restored all my data already, it doesn't matter to me anymore.)
More info at thawing-data-in-an-indexer-clustering-environment
I used it and it worked for me but USE AT YOUR OWN RISK!!!
I have this script on all of my indexers and yes, it does time out if it runs for too long.
restore_buckets.sh
#!/bin/bash
# This script will copy frozen indexes to the thaweddb and restore them.
# The user will be prompted for index, start time and end time.
# The user will be prompted to list files to restore or restore.
echo -n "$(tput setaf 2)$(tput bold)"
echo -n "Enter the index you need to restore :$(tput setaf 7) "
read index
echo -n "$(tput setaf 2)$(tput bold)"
echo -n "Enter the start time in epochtime :$(tput setaf 7) "
read startTime
echo -n "$(tput setaf 2)$(tput bold)"
echo -n "Enter the end time in epochtime :$(tput setaf 7) "
read endTime
echo " $(tput bold)"
echo " $(tput setaf 2)Restoring : $(tput setaf 7)$index"
echo " $(tput setaf 2) From : $(tput setaf 7)`date -d @$startTime +\"%Y-%m-%d %H:%M:%S\"`"
echo " $(tput setaf 2) Through : $(tput setaf 7)`date -d @$endTime +\"%Y-%m-%d %H:%M:%S\"`"
#echo " $(tput setaf 2)File Count : $(tput setaf 7)`ls -dA $SPLUNK_DB/$index/frozendb/* | awk -v et=$endTime -v st=$startTime \
##
## Added to fix awk problem when underscores are in the index name.
cd $SPLUNK_DB/$index
echo " $(tput setaf 2)File Count : $(tput setaf 7)`ls -dA frozendb/* | awk -v et=$endTime -v st=$startTime \
'BEGIN {FS = "_"} $2 <= et && $2 >= st {print $0}' | wc -l`"
echo " $(tput setaf 2)"
echo -n "List files [y/n]: $(tput setaf 7)"
echo -n " $(tput sgr0)"
read listFiles
if [ "$listFiles" != "n" ]; then
echo "Start $(tput setaf 2)End$(tput setaf 7) File"
# ls -dA $SPLUNK_DB/$index/frozendb/* | awk -v et=$endTime -v st=$startTime 'BEGIN {FS = "_"} { "date -d @"$2 " +\"%Y-%m-%d %H:%M:%S\"" \
##
## Added to fix awk problem when underscores are in the index name.
ls -dA frozendb/* | awk -v et=$endTime -v st=$startTime 'BEGIN {FS = "_"} { "date -d @"$2 " +\"%Y-%m-%d %H:%M:%S\"" \
| getline ET ; "date -d @"$3 " +\"%Y-%m-%d %H:%M:%S\"" | getline ST } $2 <= et && $2 >= st \
{printf("%s\t\033[32m%s\033[0m\t%s_\033[32m%s\033[0m_%s_%s_%s\n",ST,ET,$1,$2,$3,$4,$5)}' | sort -k3
fi
echo "$(tput bold)"
echo "$(tput setaf 2)This will copy file from $(tput setaf 1)$index/frozendb $(tput setaf 2)to $(tput setaf 1)$index/thaweddb."
echo
echo -n "$(tput setaf 2)Enter $(tput setaf 3)\"c\" $(tput setaf 2)to begin copying files. Any other input will skip this step. : $(tput setaf 3)"
read startCopy
if [ "$startCopy" == "c" ]; then
echo "$(tput setaf 3)Copying files."
# ls -dA $SPLUNK_DB/$index/frozendb/* | awk -v et=$endTime -v st=$startTime 'BEGIN {FS = "_"} $2 <= et && $2 >= st {print $0}' \
##
## Added to fix awk problem when underscores are in the index name.
ls -dA frozendb/* | awk -v et=$endTime -v st=$startTime 'BEGIN {FS = "_"} $2 <= et && $2 >= st {print $0}' \
| xargs -I BUCKET /bin/cp -r BUCKET thaweddb
echo "$(tput setaf 7)Done."
fi
echo
echo -n "$(tput setaf 2)Enter $(tput setaf 3)\"r\" $(tput setaf 2)to begin the restore. Any other input will skip this step. : $(tput setaf 3)"
read doRestore
# The splunk restore command always generates the USAGE message and 2 other line. Send this to dev null.
# The problem with that is you won't see the results of the restore.
if [ "$doRestore" == "r" ]; then
echo "$(tput setaf 3)Starting restore."
# ls -dA $SPLUNK_DB/$index/thaweddb/* | awk -v et=$endTime -v st=$startTime 'BEGIN {FS = "_"} $2 <= et && $2 >= st {print $1"_"$2"_"$3"_"$4"_"$5}' \
##
## Added to fix awk problem when underscores are in the index name.
ls -dA thaweddb/* | awk -v et=$endTime -v st=$startTime 'BEGIN {FS = "_"} $2 <= et && $2 >= st {print $1"_"$2"_"$3"_"$4"_"$5}' \
| xargs -I BUCKET --max-procs=10 $SPLUNK_HOME/bin/splunk rebuild BUCKET 2>/dev/null
echo "$(tput setaf 7)Restore complete. Splunk needs to be restarted."
echo
fi
echo "$(tput setaf 7)$(tput sgr0)"
Are you running Splunk on Linux? Then try this:
cd var/lib/splunk/eucp/tempdb ; ls | xargs -i /var/opt/splunk/bin/splunk rebuild \{\}"
If you are running Windows, try this:
for /F "usebackq delimes=" %i in (`dir /B %SPLUNK_HOME%\var/lib/splunk\{INDEX_NAME}\thaweddb\temp-*`) do %SPLUNK_HOME%\bin\splunk rebuild %i
where "{INDEX_NAME}" is the name of your index that holds the archived data to be restored.
NOTE: It has been a long time since I've written a "DOS 'for' command and I do not have a Windows Splunk instance to test this, so if Windows is your environment, this may not be coded exactly as it will need to be to get the job done.
The restore procedure and naming convention used follows the guidance of the Splunk doc article Restore archived indexed data. You need to assert the the sequence number of the bucket name (the "_\d+ at the end of each bucket name) is a duplicate of what is in the db or colddb directories; this will be a big problem with Splunk if there are duplicates.
Does the windows for loop rebuild concurrently or sequentially? I may consider creating a concurrent rebuild script for windows as well if yours is sequential.
Your solution for Linux is elegantly simple (that is a compliment), however I'm not sure it will work for every use case, for example the op said they had "months and months" of data to thaw out, this probably means hundreds of buckets and xarg is going to spawn a child process for every one of them, if you start all of those up at once its likely going to have the effect of slowing things down instead of speeding them up (for reasons I won't go into unless someone really wants me to elaborate). There are some arguments that you can pass to xarg to define a limit for the process pool that executes each rebuild which should improve your solution dramatically, I suggest you take a look at the following (in particular the --max-procs argument):
http://coldattic.info/shvedsky/pro/blogs/a-foo-walks-into-a-bar/posts/7
For anyone that doesn't need as an elaborate solution as I have posted below this is probably the way to go.
Example (not tested):
ls -dA $SPLUNK_HOME/var/lib/splunk/{index1,index2}/*/thaweddb/* | awk -v et={epoch-of-earliest-event-to-restore}, lt={epoch-of-latest-event-to-restore} 'BEGIN {FS = \"_\"} $2 >= et && $3 <= lt {print $1\"_\"$2\"_\"$3\"_\"$4}'" | xargs -I BUCKET --max-procs=10 sudo -H -u splunk $SPLUNK_HOME/bin/splunk rebuild BUCKET
Replace {epoch-of-earliest-event-to-restore} and {epoch-of-latest-event-to-restore} with actual epoch values in the example above or omit either earliest or latest time or both entirely (like below which only enforces the earliest time):
ls -dA $SPLUNK_HOME/var/lib/splunk/{index1,index2}/*/thaweddb/* | awk -v et={epoch-of-earliest-event-to-restore} 'BEGIN {FS = \"_\"} $2 >= et {print $1\"_\"$2\"_\"$3\"_\"$4}'" | xargs -I BUCKET --max-procs=10 sudo -H -u splunk $SPLUNK_HOME/bin/splunk rebuild BUCKET
One might also consider using something like "GNU parallel" as well.
This does not have any logging, status indicators, it does not check for tsidx files in a bucket before rebuilding (meaning if you re-run the command it will rebuild the same buckets over again), etc. so it's not perfect but should work for some.