Solved: Is it safe to delete all rb* frozen buckets create...

myandow · ‎02-14-2017

We have a number of indexes that have a coldToFrozenDir specified with both maxTotalDataSizeMB and frozenTimePeriodInSecs in our indexer cluster.

As expected based on this document: http://docs.splunk.com/Documentation/Splunk/6.2.2/Indexer/Automatearchiving#Clustered_data_archiving
we are seeing multiple copies of buckets being frozen across our indexers, one that starts with db_ and the others start with rb_.

Per the following answers:
https://answers.splunk.com/answers/258816/thawing-data-in-an-indexer-clustering-environment.html
https://answers.splunk.com/answers/153341/thawed-buckets-error-clusterslavebuckethandler-failed-to-t...
it sounds like we can thaw our data outside the cluster on a stand alone indexer using only the db_ buckets.

If that is the case, is it safe for us to just delete all the rb_ frozen buckets? Will we face any data loss by only using the db_ buckets to thaw frozen data from the cluster?

jkat54 · ‎02-14-2017

Yes, it is safe to delete the rb_ frozen buckets.

In fact I adjust my coldToFrozenScript to simply discard them.

Just note that if for some reason you have a search factor of 1, and the db_ copy of the bucket is corrupted somehow, then you'll have data loss. In most cases search factor is greater than 1 and therefore you'll have 2 copies of the same data in db_ files. So if one becomes corrupted, you should have another searchable copy backed up.

View solution in original post

jkat54 · ‎02-14-2017

Yes, it is safe to delete the rb_ frozen buckets.

In fact I adjust my coldToFrozenScript to simply discard them.

Just note that if for some reason you have a search factor of 1, and the db_ copy of the bucket is corrupted somehow, then you'll have data loss. In most cases search factor is greater than 1 and therefore you'll have 2 copies of the same data in db_ files. So if one becomes corrupted, you should have another searchable copy backed up.

kiran331 · ‎05-08-2018

Hi Jkat54,

Can you advise me what changes to make in the script to exclude rb_* buckets from archiving.

jkat54 · ‎06-15-2018

#this script is meant to be placed in $SPLUNK_HOME/bin/, should be marked as executable by the splunk user, and needs to be specified in indexes.conf
#you must modify ARCHIVE_DIR variable below

import sys, os, gzip, shutil, subprocess, random
# if this is executed using the splunk python environment, then SPLUNK_HOME will exist.  The below would join the value of SPLUNK_HOME with '/frozenData' creating a final path of $SPLUNK_HOME/frozenData/{frozen buckets} 
ARCHIVE_DIR = os.path.join(os.getenv('SPLUNK_HOME'), 'frozenData')  

#Another option is to specify a location with os.path.join and quotes around each directory name such as is shown below:
#ARCHIVE_DIR = os.path.join('opt',frozenData') 
#The above would put frozen buckets here: /opt/frozenData

def archiveBucket(base, files):
    print 'Archiving bucket: ' + base
    for f in files:
        full = os.path.join(base, f)
        if os.path.isfile(full):
            os.remove(full)

if __name__ == "__main__":
    if len(sys.argv) != 2:
        sys.exit('usage: python cold2frozen.py <bucket_path>')

    if not os.path.isdir(ARCHIVE_DIR):
        try:
            os.mkdir(ARCHIVE_DIR)
        except OSError:
            sys.stderr.write("mkdir warning: Directory '" + ARCHIVE_DIR + "' already exists\n")

    bucket = sys.argv[1]
    if not os.path.isdir(bucket):
        sys.exit('Given bucket is not a valid directory: ' + bucket)

    rawdatadir = os.path.join(bucket, 'rawdata')
    if not os.path.isdir(rawdatadir):
        sys.exit('No rawdata directory, given bucket is likely invalid: ' + bucket)

    files = os.listdir(bucket)
    journal = os.path.join(rawdatadir, 'journal.gz')
    if os.path.isfile(journal):
        archiveBucket(bucket, files)
    else:
        sys.exit('No journal file found, bucket invalid:' + bucket)

    if bucket.endswith('/'):
        bucket = bucket[:-1]

    if "rb_" in bucket:
        #replicated bucket, so quit / do nothing. this means replicated buckets will not be frozen with this script... remove this if condition or expand upon it as needed.
        quit()

    indexname = os.path.basename(os.path.dirname(os.path.dirname(bucket)))
    destdir = os.path.join(ARCHIVE_DIR, indexname, os.path.basename(bucket))

    while os.path.isdir(destdir):
        print 'Warning: This bucket already exists in the archive directory'
        print 'Adding a random extension to this directory...'
        destdir += '.' + str(random.randrange(10))

    shutil.copytree(bucket, destdir)

jkat54 · ‎08-17-2023

On windows you may need to use os.sep in the path variables:

Ex:

ARCHIVE_DIR = os.path.join("c:",os.sep,"splunk","frozenPath")

Is it safe to delete all rb* frozen buckets created in an indexer cluster without losing data?

.conf25 Global Broadcast: Don’t Miss a Moment

Observe and Secure All Apps with Splunk

What's New in Splunk Observability - August 2025

Are you a member of the Splunk Community?

Is it safe to delete all rb* frozen buckets created in an indexer cluster without losing data?

.conf25 Global Broadcast: Don’t Miss a Moment

Observe and Secure All Apps with Splunk

What's New in Splunk Observability - August 2025