Deployment Architecture

Wrong retention caused buckets to freeze in cluster- Need to restore frozen data

Communicator

What happen is we were changining index cold volume..The change the size of warm db and force bucket to roll to cold
The buckets roll to cold and now they we can't search the old data.
The buckets are in cold db and also can be search by running the search on the search peers
and the bucket is showed the flag is in frozen. How to remove the frozen flag

0 Karma

Splunk Employee
Splunk Employee

Detailed steps with correction on some syntax:

WARNING - make sure you have a full backup before going forward.

Step 1 - place CM in maintenance-mode
./splunk enable maintenance-mode

Step 2 - On each peer:

  1. stop splunkd
  2. go to each index directory (db and colddb):
    2.1 grep -Rle '1$' --include bucket_info.csv . | xargs sed -i 's/1$/0/g'
    2.2 mv .bucketManifest /tmp/.bucketManifest //backup bucket manifest file for each index

  3. repeat step 2 till all indexes done

4. start splunkd

Step 3 - restart CM <----------------IMPORTANT: this will allow the latest cluster frozen flag to be updated on CM.
Step 4 - confirm all peers "UP" then disable maintenance-mode
./splunk disable maintenance-mode
Step 5 - confirm buckets being replicated through GUI

==============================================
P.S. Additional script to do the job automatically through all buckets:

#! /bin/ksh
for CSV in $(grep -Rle '1$' --include bucket_info.csv . )
do
sed 's/1$/0/g' $CSV  > /tmp/HOLD.out
cat /tmp/HOLD.out > $CSV
DIR_NAME=$(dirname $(dirname $CSV))
if [ -f ${DIR_NAME}/.bucketManifest ]
  then
     Index_Name=$(print $DIR_NAME | awk -F "/" '{print $2}')
     print " ${DIR_NAME}/.bucketManifest /tmp/${Index_Name}.bucketManifest"
     mv ${DIR_NAME}/.bucketManifest  /tmp/${Index_Name}.bucketManifest
fi
done

Splunk Employee
Splunk Employee
  1. Set CM in Maintenance mode. stop all CP's, Stop CM.

./splunk enable maintenance-mode
./splunk stop.

2.On each CP cd into directory: ./splunk/var/lib/splunk//db
3. Run command

grep -Rle '1$' --include bucket_info.csv

Verifies these are the only buckets marked as frozen
example: "indextimeet","indextimelt","frozenincluster"
1484368345,1484385061,1

  1. Run command ( to unfreeze the bucket)
    grep -Rle '1$' --include bucket_info.csv | xargs sed -i 's/1$/0/g'

    Locates frozen buckets and replace the 1 (frozen=true) to 0 (frozen=false)

  2. Run same steps (2-4) for colddb directory: : ./splunk/var/lib/splunk//colddb

  3. Locate .bucketManifest file by running ls -la within ./splunk/var/lib/splunk/ directory.

  4. Move .bucketManifest file outside of splunk into tmp directory, during startup the file will regenerate with new information.

  5. Started up CM, set in Maintenance, start up CPs, take CM out of Maintenance:
    ./splunk disable maintenance-mode
    ./splunk show maintenance-mode