Deployment Architecture

Restoring archived indexed data

ChhayaV
Communicator

Hi,

I've archived indexed data into location "D:\Program Files\Splunk\myfrozenarchive" and now myfrozenarchive folder has to folders

db_1364755264_1356979773_16
db_1364971832_1364756312_15

alt text

Both these inturn contain folder called rawdata.

I'm trying to restore this archived data but getting error in one of the steps mentioned in the documentation

Windows users
Here is an example of safely restoring a 4.2+ archive bucket to thawed:
1. Copy your archive bucket to the thawed directory:
> xcopy
D:\MyArchive\db_1181756465_1162600547_0 %SPLUNK_HOME%\var\lib\splunk\defaultdb\thaweddb\/s /e /v

2. Execute the rebuild command on the temporary bucket to rebuild the Splunk
indexes and associated files:
> splunk
rebuild %SPLUNK_HOME%\var\lib\splunk\defaultdb\thaweddb\temp_db_1181756465_1162600547_0

3. Rename the temporary bucket to something that Splunk will recognize:
> cd %SPLUNK_HOME%\var\lib\splunk\defaultdb\thaweddb
> move temp_db_1181756465_1162600547_0 db_1181756465_1162600547_1001
Note: You must choose a bucket id that does not conflict with any other bucket in
the index. This example assumes that the bucket id '1001' is unique for the index.
If it isn't, choose some other, non-conflicting bucket ID.

4. Restart Splunk. Go to %SPLUNK_HOME%\bin and run this command:
> splunk restart

I'm getting following error in the 2nd step while trying to rebuild.

Command:

D:\Program Files\splunk\bin > splunk rebuild “D:\Program Files\splunk\var\lib\splu
nk\defaultdb\thaweddh\temp_db_1364755264_1356979773_16”


ERROR — Error opening ‘D:\Program Files\Splunk\uar\log\splunk\splunkd.log”: The
process cannot access the file because it is being used by another process.
ERROR — Error opening ‘D:\Program Files\Splunk\var\log\splunk\splunkd.log”: The
process cannot access the file because it is being used by another process.
ERROR — Error opening ‘D:\Program Files\Splunk\var\log\splunk\splunkd.log”: The
process cannot access the file because it is being used by another process.
ERROR — Error opening ‘D:\Program Files\Splunk\var\log\splunk\splunkd.log”: The
process cannot access the file because it is being used by another process.
ERROR — Error opening ‘D:\Program Files\Splunk\var\log\splunlc\splunkd.log”: The
process cannot access the file because it is being used by another process.
ERROR — Error opening ‘D:\Program Files\Splunk\var\log\splunk\splunkd.log”: The
process cannot access the file because it is being used by another process.
ERROR — Error opening ‘D:\Program Files\Splunk\var\log\splunk\splunkd.log”: The
process cannot access the file because it is being used by another process.
ERROR — Error opening ‘D:\Program Files\Splunk\var\log\splunk\splunkd.log”: The
process cannot access the file because it is being used by another process.
ERROR — Error opening ‘D:\Program Files\Splunk\var\log\splunk\splunkd.log”: The
process cannot access the file because it is being used by another process.
ERROR — Error opening ‘D:\Program Files\Splunk\var\log\splunk\splunkd.log”: The
process cannot access the file because it is being used by another process.
ERROR — Error opening ‘D:\Program Files\Splunk\var\log\splunk\splunkd.log”: The
process cannot access the file because it is being used by another process.
ERROR — Error opening ‘D:\Program Files\Splunk\var\log\splunk\splunkd.log”: The
process cannot access the file because it is being used by another process.
Error: The path D:\Program Files\Splunk\var\lib\splunk\defaultdb\thaweddb\temp_
db_1364755264_1356979773_16 is not a directory.
Rebuilding bucket failed

Please can any one elaborate 2nd and 3rd steps.
Version used : splunk 5.0.2

My thaweddb folder

alt text

Thank You.

ChrisG
Splunk Employee
Splunk Employee

Because it is referred to in the discussion: Restore archived indexed data in the Managing Indexes and Clusters manual is the official documentation for this subject. It does not present any scripted solution, though.

0 Karma

jkat54
SplunkTrust
SplunkTrust

restor.ps1 - windows powershell users

# This script will fsck archived/restore data and then put it into the index
# It is designed for Splunk 4.1+, developed and tested on Splunk 6.1.2
# It is meant to be run as the SplunkD user

# VERY IMPORTANT: HUGE ASSUMPTION 
# YOU HAVE ALREADY SCRUBBED YOUR BUCKETS!!! -  
# Bucket Scrubbing = Insuring that you will not restore any archived/backup buckets on top of their existing buckets within an index.  (Making sure the old file names arent the same as exising files)
# The buckets may be found in $INDEX_PATH\db, $INDEX_PATH\colddb, $INDEX_PATH\thaweddb, etc.. They have a name that starts with db_ and rb_, where db = normal copy & rb = replicated copy
# YOU MUST SCRUB YOUR BUCKETS BEFORE YOU RUN THIS SCRIPT OR YOU MAY BECOME THE NEEDLE IN THE HAYSTACK

$splunkExePath="c:\program files\splunk\bin\splunk.exe" #this is the path to splunk.exe
$archiveBucketPath="c:\splunkbackups" #this is the path to your SCRUBBED restore/archive files (db_ & rb_)
$restoreTemp="c:\temp"  #this is the path to a temp location that splunk.exe will be able to access, it should have enough storage for all the buckets you'll restore
$restoreToIndexPath="c:\splunk\index_name"  #This is path the index you want to restore into. Most Splunkers recommend restoring into a new index if possible.  The script assumes this exists.
$restoreToIndexThawedPath="c:\splunk\index_name\thaweddb"  #This is the path to the thaweddb folder in the index you're restoring too.

# find the db_ buckets and process each of them
$array=@(ls $archiveBucketPath| Where-Object {$_.Name -like "rb_*" -or $_.Name -like "db_*"} | select-object -expandproperty FullName)
foreach ($bucketdir in $array) {

    $bucketname=$($bucketdir -replace ".*\\","")

    # make temp dir
    if (!$(test-path $restoreTemp)) {
    mkdir $restoreTemp
    }

    # copy bucket to temp dir
    if (!$(test-path $restoreTemp/$bucketname)) {
        copy-item $bucketdir $restoreTemp/$bucketname -Recurse #-WhatIf  
    }

    # splunk(fsck) the index
    $splunkExePath rebuild "$restoreTemp\$bucketname"

    # move the fsck'd bucket into the thaweddb directory
    if (!$(test-path $restoreToIndexThawedPath\$bucketname)) {
        move $restoreTemp\$bucketname $restoreToIndexThawedPath\$bucketname #-WhatIf
    }

    # if meta.dirty doesnt exists create it
    if (!$(test-path $restoreToIndexPath\meta.dirty)) {
            New-Item $restoreToIndexPath\meta.dirty -type file #-WhatIf
    }

    # if processed directory doesnt exist create it
    if (!$(test-path $archiveBucketPath\processed\)) {
            mkdir $archiveBucketPath\processed
    }
    move $bucketdir $archiveBucketPath\processed\

    #counter
    $x=$x+1

    # give user chance to escape
    write-host "weve just completed $($x) of $($array.count)"
    # OPTIONAL: If you uncomment the below, this script will give you 15s to break operation after each bucket is processed.
    # echo You can break in the next 15s without losing any progress
    # sleep 15
    echo proceeding

}

echo Total buckets restored: $x
0 Karma

jkat54
SplunkTrust
SplunkTrust

For the restor.sh, change splunkExePath to splunkBinPath everywhere you see it mentioned.

0 Karma

osirismdw
New Member

For others trying to use this, you need to put a & for powershell at the beginning of line 34 for this to work. Took me awhile to figure that out.

0 Karma

jkat54
SplunkTrust
SplunkTrust

restor.sh (for unix users)

#!/bin/bash

# This script will fsck archived/restore data and then put it into the index
# It is designed for Splunk 4.1+, developed and tested on Splunk 6.1.2
# It is meant to be run as the SplunkD user

# VERY IMPORTANT: HUGE ASSUMPTION 
# YOU HAVE ALREADY SCRUBBED YOUR BUCKETS!!! -  
# Bucket Scrubbing = Insuring that you will not restore any archived/backup buckets on top of their existing buckets within an index.  (Making sure the old file names arent the same as exising files)
# The buckets may be found in $INDEX_PATH\db, $INDEX_PATH\colddb, $INDEX_PATH\thaweddb, etc.. They have a name that starts with db_ and rb_, where db = normal copy & rb = replicated copy
# YOU MUST SCRUB YOUR BUCKETS BEFORE YOU RUN THIS SCRIPT OR YOU MAY BECOME THE NEEDLE IN THE HAYSTACK

splUser=splunk
splunkBinPath=/opt/splunk/bin/splunk #this is the path to splunk binary / executable
archiveBucketPath=/var/splunk/backups/index_name/db  #this is the path to your SCRUBBED restore/archive files (db_ & rb_)
restoreTemp=/var/splunk/index_name/temp  #this is the path to a temp location that splunk will be able to access, it should have enough storage for all the buckets you'll restore
restoreToIndexPath=/var/splunk/index_name/db  #This is path the index you want to restore into. Most Splunkers recommend restoring into a new index if possible.  The script assumes this exists.
restoreToIndexThawedPath=/var/splunk/index_name/thaweddb  #This is the path to the thaweddb folder in the index you're restoring too.

# find the db_ buckets and process each of them
for i in `find $archiveBucketPath -type d -name "db_*"`; do sourceBuckets+=($i);done
for bucketdir in ${sourceBuckets[@]}; do

    let x=$x+1
    bucketname=$(echo $bucketdir|sed "s/.*\///g")

    # make temp dir
    if [ ! -d $restoreTemp ]; then mkdir $restoreTemp;fi

    # copy bucket to temp dir & chown
    cp -r $bucketdir $restoreTemp/$bucketname
    chown -Rf $splUser:$splUser $restoreTemp/$bucketname

    # splunk(fsck) the index
    $splunkExePath rebuild $restoreTemp/$bucketname

    # move the fsck'd bucket into the thaweddb directory
    mv $restoreTemp/$bucketname $restoreToIndexThawedPath/$bucketname
    chown -Rf $splUser:$splUser $restoreToIndexThawedPath/$bucketname

    # if meta.dirty doesnt exists touch & chown it.
    if [ ! -f $restoreToIndexPath/meta.dirty ]; then
            touch $restoreToIndexPath/meta.dirty;
            chown $splUser:$splUser $restoreToIndexPath/meta.dirty;
    fi

    # if processed directory doesnt exist create it
    if [ ! -d $archiveBucketPath/processed/ ];
            then mkdir $archiveBucketPath/processed/;
    fi
    mv $bucketdir $archiveBucketPath/processed/

    # give user chance to escape
    echo weve just completed $x of ${#sourceBuckets[@]}
    # OPTIONAL: If you uncomment the below, this script will give you 15s to break operation after each bucket is processed.
    # echo You can break in the next 15s without losing any progress
    # sleep 15
    echo ... proceeding

done

echo Total buckets restored: $x
0 Karma

ChhayaV
Communicator

Thank you. I restored the data.

I've one more doubt,db_xx_16 bucket is restored, I'm trying for second one db_xx_15 is not getting copied to thaweddb folder.

If if we have 100 such db_xx_01 to db_xx_100 files then how to restore all of them in single shot or we have to do this one by one?

0 Karma

ChhayaV
Communicator

Hi kristian,

There is no temp_db_xx?I didnt rename it, thats silly mistake on my side.
But even if i give bucket name as db_xx its giving me same error.
I checked thaweddb folder, it contains rawadata folder which has journal.gz file so i tried
thaweddb\rawdata
This time it is giving error as "unable to find rawdata directory"

0 Karma

lukejadamec
Super Champion

The error opening the splunkd.log is happening because you are rebuilding while splunk is running. This will not cause the rebuild to fail, but it will probably result in lost log messages from the rebuild process.The last error means your path to the bucket was wrong.

My preference is to rebuild while splunk is down, and copy the path directly out of the explorer window.

0 Karma

jkat54
SplunkTrust
SplunkTrust

I gave some examples of automating this process that you may find useful. They'll find the db_ and rb_ buckets in your archive path / backup path, and rebuild then place them in a thaweddb location of your choice. They don't check for bucket collision so you must manually scrub your buckets first. The scripts do not import hot buckets from the archive / backup. Hot buckets could be handled as follows: put them on a new splunk instance, roll them to cold, scrub the buckets, then run the scripts on the newly created cold buckets.

0 Karma

lukejadamec
Super Champion

I don't know of a command to run multiples. I do them one at a time in separate command windows. The most I ever need do is about 6 at a time. In your case, you might want to write a script to rebuild all db folders in your temp directory one at a time, and just let it run.

ChhayaV
Communicator

hi lukejadamec,
Is there any special command to run multiple rebuild processes or any documentaion? I'm getting 100s of such db_xx_xx buckets. It'll take time to run one bye one.

0 Karma

lukejadamec
Super Champion

The rebuild command needs just rebuild and a path just like you have done, but the path needs to exist:
splunk rebuild “D:\Program Files\splunk\var\lib\splu
nk\defaultdb\thaweddh\temp_db_1364755264_1356979773_16.

Lastly, rebuild can work only on one bucket at a time, but you can run multiple rebuild processes (one cmd.exe window for each bucket). You will get errors about not being able to access the log file, but the rebuild should work.

0 Karma

ChhayaV
Communicator

"The last error means your path to the bucket was wrong."

I'm not getting what's wrong in the path. I'm not getting what exactly the rebuild command needs

0 Karma

lukejadamec
Super Champion

The error opening the splunkd.log is happening because you are rebuilding while splunk is running. This will not cause the rebuild to fail, but it will probably result in lost log messages from the rebuild process.
The last error means your path to the bucket was wrong.

kristian_kolb
Ultra Champion

between step 1 and step 2, did you change the bucket name from db_xxxx to temp_db_xxxx? or where did the name temp_db_xxx come from? xcopy didn't do that automatically, no?

0 Karma
Get Updates on the Splunk Community!

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...

Video | Welcome Back to Smartness, Pedro

Remember Splunk Community member, Pedro Borges? If you tuned into Episode 2 of our Smartness interview series, ...

Detector Best Practices: Static Thresholds

Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within ...