Archive

importtool not importing data

Path Finder

Hello,
Several months of data have been exported via exporttool (using the csv option) and is now ready to import the data. I wrote a script that should do the trick, but I'm missing something on the Splunk side.

In the script I issue the command

/opt/splunk/bin/importtool /splunk/data/defaultdb/db /swap/export/db_1378961993_1378907854_240

The system thinks for 5 min, then the directory /splunk/data/defaultdb/db/rawdata/ is formed and the following files are in it.

-rw-------. 1 root root 173074314 Sep 23 02:04 journal.gz

-rw-------. 1 root root 31681 Sep 23 02:33 slicemin.dat

-rw-------. 1 root root 281189 Sep 23 02:33 slicesv2.dat

This is where I can not seem to get anything good to happen. Restarting the Splunk instance does not do anything. I have asked for a metadata rebuild

/opt/splunk/bin/splunk _internal call /data/indexes/defaultdb/rebuild-metadata-and-manifests

and that has had no effect.

Any help on what needs to happen to have Splunk read in the raw data Journal file?

Tags (1)
0 Karma

Path Finder

What was happening in the files were being put into /splunk/data/defaultdb/db (and then the system created rawdata under this.

I figured I would share my script as it was very nice to have this.

You will make 2 files
Import_lanch.bash
Import_run.bash

you will need to chmod them them to 700

+++++++++++++++++++++++++++++++++++++
Import_lanch.bash
+++++++++++++++++++++++++++++++++++++
#!/bin/bash
#needed to run importtool
    #update SPLUNK_HOME if needed
    SPLUNK_HOME=/opt/splunk
    export SPLUNK_HOME

#Set the following variables
    #SOURCEDIRECTORY="/swap/export"
    SOURCEDIRECTORY=

    #Need to update this to where your db directoy is.
    SPLUNKDATADIR="/splunk/data"

#This will set up the system and ensure that it ready for the decompress.
    rm -f $FILEPROCESSCNT
    if ! [ -d $SPLUNKDATADIR/defaultdb/db/temp ]; then mkdir $SPLUNKDATADIR/defaultdb/db/temp ; fi
    if ! [ -d $SPLUNKDATADIR/defaultdb/db/temp/files ]; then mkdir $SPLUNKDATADIR/defaultdb/db/temp/files ; fi

#This is the main call.  For each file that matchs the -name, it will call Import_run.bash and import the file.
    find /swap/export/defaultdb/ -name "db_??????????_??????????_????" -print0 |xargs -0 -n 1 ./Import_run.bash

#The below will be ran after all the import are done.
    # Uncomment if the system is part of a cluster.
    /opt/splunk/bin/splunk offline
    /opt/splunk/bin/splunk restart 
    /opt/splunk/bin/splunk _internal call /data/indexes/defaultdb/rebuild-metadata-and-manifests

+++++++++++++++++++++++++++++++++++++
Import_run.bash
+++++++++++++++++++++++++++++++++++++

#!/bin/bash
if ! [ -z $1 ]; then

#Set the following variables
    #SOURCEDIRECTORY="/swap/export"
    SOURCEDIRECTORY=

    #leave blank if not cluster
    #CLUSTERGUID="_CCCCCCC2-5050-4444-BBBA-AAAAAAAAAAAF"
    CLUSTERGUID=""

    #Need to update this to where your db directoy is.
    SPLUNKDATADIR="/splunk/data"

    FILECOUNT=`ls -l $SOURCEDIRECTORY/defaultdb/db_??????????_??????????_???? | wc -l`
    FILEPROCESSCNT="$SOURCEDIRECTORY/FILEPROCESSCNT.foo"
    [ -e $FILEPROCESSCNT ] && FPC=`cat $FILEPROCESSCNT | sed 's/^\([0-9]*\).*$/\1/'`

    /opt/splunk/bin/importtool  $SPLUNKDATADIR/defaultdb/db/temp/files $1
    FPC=`expr $FPC + 1`
    echo "$FPC of $FILECOUNT done with file $1"
    echo $FPC >$FILEPROCESSCNT

    FILENAMEPRE=`echo $1 |cut -d "_" -f 1-3 |cut -d "/" -f  5`

    #Adjust 100 to a number that makes sense in your environment 
    # e.g.  a number which is 50 or 100 greater or less than your current numbers.
    FILENUM=`expr $FPC + 100`

    FILENAME+=$FILENAMEPRE
    FILENAME+="_"
    FILENAME+=$FILENUM
    FILENAME+="_C7B32B92-50CC-4E19-B0BA-A8B93A8012DF"

    mkdir $SPLUNKDATADIR/defaultdb/db/$FILENAME

    #This assumes that your db files are in /splunk/data
    mv $SPLUNKDATADIR/defaultdb/db/temp/files/* $SPLUNKDATADIR/defaultdb/db/$FILENAME

    mv $1 $1.done
fi
0 Karma

Super Champion
0 Karma

Super Champion

You cannot give Splunk a list of searchable locations for warm buckets as far as I know.
No need to rebuild meta data. Just rebuild the whole bucket. Working from a copy, delete all files except the journal.gz in the defaultdb/db/db_1378961993_1378907854_240/rawdata/ folder, and run the rebuild command on the db_1378961993_1378907854_240 directory.

0 Karma

Super Champion

I don't know what you mean by "I don't think any of this matters".
The directory matters.
/splunk/\var/\lib/default/db/rawdata is not a searchable directory.
The unique ID number does not have to be in order. Every time you restart Splunk it will create a manifest in each db directory that lists all the unique IDs. If there are two that are the same (not unique) then you will get an error to that effect. It does not matter if the date stamps are different, just the unique ID.

0 Karma

Path Finder

Also after I
mkdir /splunk/data/defaultdb/db/db_1378961993_1378907854_240
and
mv /splunk/data/defaultdb/db/rawdata /splunk/data/defaultdb/db/db_1378961993_1378907854_240

do I need to do anything to tell the system that this now a searchable directoy, rebulld the meta data, or anyhting like that? (e.g. should I issue a '/opt/splunk/bin/splunk _internal call /data/indexes/defaultdb/rebuild-metadata-and-manifests' or 'touch /splunk/data/defaultdb/db/meta.dirty' and then restart splunk?

0 Karma

Path Finder

I don't think any of this matters at this level but I am doing clustering on this indexer, so I will use the splunk offline, then the splunk restart command.

Do the file name - sequence number need to be in order? they are currently in the 40 range, and if I import 240, what will happen when splunk gets there? They will have different date stamps, so I'm not worried about directory collision, but will splunk be OK with duplicate sequence numbers?

0 Karma

Super Champion

The journal.gz is the splunk readable version of raw data. The slice files help splunk find stuff inside the journal. Basically, it looks normal except for it being
here /splunk/data/defaultdb/db/rawdata/
The /splunk/data/defaultdb/db/ directory should contain a bunch of directories with a name similar to the one you started with: db_1378961993_1378907854_240
And the raw folder with its journal.gz should inside.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!