Several months of data have been exported via exporttool (using the csv option) and is now ready to import the data. I wrote a script that should do the trick, but I'm missing something on the Splunk side.
In the script I issue the command
/opt/splunk/bin/importtool /splunk/data/defaultdb/db /swap/export/db_1378961993_1378907854_240
The system thinks for 5 min, then the directory /splunk/data/defaultdb/db/rawdata/ is formed and the following files are in it.
-rw-------. 1 root root 173074314 Sep 23 02:04 journal.gz
-rw-------. 1 root root 31681 Sep 23 02:33 slicemin.dat
-rw-------. 1 root root 281189 Sep 23 02:33 slicesv2.dat
This is where I can not seem to get anything good to happen. Restarting the Splunk instance does not do anything. I have asked for a metadata rebuild
/opt/splunk/bin/splunk _internal call /data/indexes/defaultdb/rebuild-metadata-and-manifests
and that has had no effect.
Any help on what needs to happen to have Splunk read in the raw data Journal file?
What was happening in the files were being put into /splunk/data/defaultdb/db (and then the system created rawdata under this.
I figured I would share my script as it was very nice to have this.
You will make 2 files
you will need to chmod them them to 700
+++++++++++++++++++++++++++++++++++++ Import_lanch.bash +++++++++++++++++++++++++++++++++++++ #!/bin/bash #needed to run importtool #update SPLUNK_HOME if needed SPLUNK_HOME=/opt/splunk export SPLUNK_HOME #Set the following variables #SOURCEDIRECTORY="/swap/export" SOURCEDIRECTORY= #Need to update this to where your db directoy is. SPLUNKDATADIR="/splunk/data" #This will set up the system and ensure that it ready for the decompress. rm -f $FILEPROCESSCNT if ! [ -d $SPLUNKDATADIR/defaultdb/db/temp ]; then mkdir $SPLUNKDATADIR/defaultdb/db/temp ; fi if ! [ -d $SPLUNKDATADIR/defaultdb/db/temp/files ]; then mkdir $SPLUNKDATADIR/defaultdb/db/temp/files ; fi #This is the main call. For each file that matchs the -name, it will call Import_run.bash and import the file. find /swap/export/defaultdb/ -name "db_??????????_??????????_????" -print0 |xargs -0 -n 1 ./Import_run.bash #The below will be ran after all the import are done. # Uncomment if the system is part of a cluster. /opt/splunk/bin/splunk offline /opt/splunk/bin/splunk restart /opt/splunk/bin/splunk _internal call /data/indexes/defaultdb/rebuild-metadata-and-manifests +++++++++++++++++++++++++++++++++++++ Import_run.bash +++++++++++++++++++++++++++++++++++++ #!/bin/bash if ! [ -z $1 ]; then #Set the following variables #SOURCEDIRECTORY="/swap/export" SOURCEDIRECTORY= #leave blank if not cluster #CLUSTERGUID="_CCCCCCC2-5050-4444-BBBA-AAAAAAAAAAAF" CLUSTERGUID="" #Need to update this to where your db directoy is. SPLUNKDATADIR="/splunk/data" FILECOUNT=`ls -l $SOURCEDIRECTORY/defaultdb/db_??????????_??????????_???? | wc -l` FILEPROCESSCNT="$SOURCEDIRECTORY/FILEPROCESSCNT.foo" [ -e $FILEPROCESSCNT ] && FPC=`cat $FILEPROCESSCNT | sed 's/^\([0-9]*\).*$/\1/'` /opt/splunk/bin/importtool $SPLUNKDATADIR/defaultdb/db/temp/files $1 FPC=`expr $FPC + 1` echo "$FPC of $FILECOUNT done with file $1" echo $FPC >$FILEPROCESSCNT FILENAMEPRE=`echo $1 |cut -d "_" -f 1-3 |cut -d "/" -f 5` #Adjust 100 to a number that makes sense in your environment # e.g. a number which is 50 or 100 greater or less than your current numbers. FILENUM=`expr $FPC + 100` FILENAME+=$FILENAMEPRE FILENAME+="_" FILENAME+=$FILENUM FILENAME+="_C7B32B92-50CC-4E19-B0BA-A8B93A8012DF" mkdir $SPLUNKDATADIR/defaultdb/db/$FILENAME #This assumes that your db files are in /splunk/data mv $SPLUNKDATADIR/defaultdb/db/temp/files/* $SPLUNKDATADIR/defaultdb/db/$FILENAME mv $1 $1.done fi
You cannot give Splunk a list of searchable locations for warm buckets as far as I know.
No need to rebuild meta data. Just rebuild the whole bucket. Working from a copy, delete all files except the journal.gz in the defaultdb/db/db_1378961993_1378907854_240/rawdata/ folder, and run the rebuild command on the db_1378961993_1378907854_240 directory.
I don't know what you mean by "I don't think any of this matters".
The directory matters.
/splunk/\var/\lib/default/db/rawdata is not a searchable directory.
The unique ID number does not have to be in order. Every time you restart Splunk it will create a manifest in each db directory that lists all the unique IDs. If there are two that are the same (not unique) then you will get an error to that effect. It does not matter if the date stamps are different, just the unique ID.
Also after I
mv /splunk/data/defaultdb/db/rawdata /splunk/data/defaultdb/db/db_1378961993_1378907854_240
do I need to do anything to tell the system that this now a searchable directoy, rebulld the meta data, or anyhting like that? (e.g. should I issue a '/opt/splunk/bin/splunk _internal call /data/indexes/defaultdb/rebuild-metadata-and-manifests' or 'touch /splunk/data/defaultdb/db/meta.dirty' and then restart splunk?
I don't think any of this matters at this level but I am doing clustering on this indexer, so I will use the splunk offline, then the splunk restart command.
Do the file name - sequence number need to be in order? they are currently in the 40 range, and if I import 240, what will happen when splunk gets there? They will have different date stamps, so I'm not worried about directory collision, but will splunk be OK with duplicate sequence numbers?
The journal.gz is the splunk readable version of raw data. The slice files help splunk find stuff inside the journal. Basically, it looks normal except for it being
The /splunk/data/defaultdb/db/ directory should contain a bunch of directories with a name similar to the one you started with: db_1378961993_1378907854_240
And the raw folder with its journal.gz should inside.