Solved: How to quickly validate the metadata files of a gi...

hexx · ‎08-09-2010

When the filesystem that Splunk uses to store its indexes becomes unavailable, goes into read-only mode or Splunk crashes, inconsistencies are sometimes introduced in the metadata files of some indexes and buckets. These files typically are Sources.data, Hosts.data and SourceTypes.data. There is a set of these in the index hot/warm directory, and in each bucket.

The presence of a corrupt metadata file in a bucket of one of the indexes currently used will keep Splunk from restarting. Typically, errors as shown below will show up in $SPLUNK_HOME/var/log/splunk/splunkd.log and Splunk will crash when attempting to start :

ERROR WordPositionData - couldn't parse hash code

Unfortunately as Splunk starts, although splunkd.log reports which index contains a corrupt metadata file it will not indicate in which bucket that file is present or which file that is.

Is there a way to quickly scan an index an all of its buckets to detect which metadata files are corrupted and need to be moved out of the way?

hexx · ‎08-09-2010

There is a command that ships with Splunk and which is capable of checking the consistency of the metadata files of any given index or bucket :

$SPLUNK_HOME/bin/splunk cmd recover-metadata {path_to_index|path_to_bucket} --validate

Note that the "--validate" option will essentially act as "fsck -n" : It will report errors but not make any changes. For a given index, I like to run the script below to check the metadata files at the root of the hot/warm db and then those contained in each bucket :

for i in find "$PATH_TO_INDEX" \( -name db_*_*_*  -o -name hot_v*_* \); do echo "Checking metadata in bucket $i ..."; $SPLUNK_HOME/bin/splunk cmd recover-metadata $i --validate; done; $SPLUNK_HOME/bin/splunk cmd recover-metadata echo $i | sed 's/\(.*\)\/db_[^/]*$/\1/' --validate

or fanned out for readability (at least readable for shellscripts):

for i in `find "$PATH_TO_INDEX" \( -name db_*_*_*  -o -name hot_v*_* \)`; do 
    echo "Checking metadata in bucket $i ..."; 
    $SPLUNK_HOME/bin/splunk cmd recover-metadata $i --validate
done
$SPLUNK_HOME/bin/splunk cmd recover-metadata `echo $i | sed 's/\(.*\)\/db_[^/]*$/\1/'` --validate

"PATH_TO_INDEX" should be the path to the directory of the affected index containing the "db" and "colddb" directories. For the default index ("main"), it is "$SPLUNK_HOME/var/lib/splunk/defaultdb".

Each time an error is reported, the corresponding .data file should be moved out of the way or deleted, as Splunk will rebuild them on the next start up.

Another solution is to create a "meta.dirty" file at the root of the affected index db ($SPLUNK_HOME/var/lib/splunk/defaultdb/db/ for example), which will also dynamically prompt Splunk to rebuild the metadata files for that index.

Once all corrupted metadata files have been removed, the check should be run again. It will indicate errors for those files because they can't be found, but Splunk should be now ready to start.

Repeat the operation for each index for which splunkd.log reports this type of error.

View solution in original post

jrodman · ‎12-13-2010

As a corrolary to the metadata checker above, the following can be used to check the health of your tsidx (text search) files.

for tsidx_file in $(find "$PATH_TO_INDEX" -type f -name '*.tsidx'); do
   output="$(splunk cmd tsidxprobe "$tsidx_file")"
   tsidxprobe_exit_code=$?
   if [ $tsidxprobe_exit_code != 0 ]; then
      echo tsidxprobe "error: $tsidx_file gave an error; return code: $tsidxprobe_exit_code"
      echo "$output"
   fi
done

The main useful idea here is tsidxprobe returns nonzero on failure, and the output is hard to guess, so store and emit it if it was a fail.

hexx · ‎08-09-2010

There is a command that ships with Splunk and which is capable of checking the consistency of the metadata files of any given index or bucket :

$SPLUNK_HOME/bin/splunk cmd recover-metadata {path_to_index|path_to_bucket} --validate

Note that the "--validate" option will essentially act as "fsck -n" : It will report errors but not make any changes. For a given index, I like to run the script below to check the metadata files at the root of the hot/warm db and then those contained in each bucket :

for i in find "$PATH_TO_INDEX" \( -name db_*_*_*  -o -name hot_v*_* \); do echo "Checking metadata in bucket $i ..."; $SPLUNK_HOME/bin/splunk cmd recover-metadata $i --validate; done; $SPLUNK_HOME/bin/splunk cmd recover-metadata echo $i | sed 's/\(.*\)\/db_[^/]*$/\1/' --validate

or fanned out for readability (at least readable for shellscripts):

for i in `find "$PATH_TO_INDEX" \( -name db_*_*_*  -o -name hot_v*_* \)`; do 
    echo "Checking metadata in bucket $i ..."; 
    $SPLUNK_HOME/bin/splunk cmd recover-metadata $i --validate
done
$SPLUNK_HOME/bin/splunk cmd recover-metadata `echo $i | sed 's/\(.*\)\/db_[^/]*$/\1/'` --validate

"PATH_TO_INDEX" should be the path to the directory of the affected index containing the "db" and "colddb" directories. For the default index ("main"), it is "$SPLUNK_HOME/var/lib/splunk/defaultdb".

Each time an error is reported, the corresponding .data file should be moved out of the way or deleted, as Splunk will rebuild them on the next start up.

Another solution is to create a "meta.dirty" file at the root of the affected index db ($SPLUNK_HOME/var/lib/splunk/defaultdb/db/ for example), which will also dynamically prompt Splunk to rebuild the metadata files for that index.

Once all corrupted metadata files have been removed, the check should be run again. It will indicate errors for those files because they can't be found, but Splunk should be now ready to start.

Repeat the operation for each index for which splunkd.log reports this type of error.

rgcurry · ‎07-13-2012

NOTE: I tried this on Splunk 4.2.4 and it reposted that "recover" was removed.

hexx · ‎09-17-2010

Do note that in most cases, it's the metadata files in the index root directory and/or in it's hot buckets that are responsible for this situation.

How to quickly validate the metadata files of a given index and of all its buckets?

Can’t make it to .conf25? Join us online!

Calling All Security Pros: Ready to Race Through Boston?

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Customer success is front and center at .conf25

Are you a member of the Splunk Community?

How to quickly validate the metadata files of a given index and of all its buckets?

Can’t make it to .conf25? Join us online!

Calling All Security Pros: Ready to Race Through Boston?

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Customer success is front and center at .conf25