Deployment Architecture

Rawdata contains non-compressed file

MaverickT
Communicator

Due to migration from a stand-alone indexer (on Windows) to three-node indexer cluster (CentOS), we are doing a test migration of an index. We had managed to copy all buckets to one of the new nodes (indexer01) and appended instances GUID to the end of the bucket folders, started splunk instaces, exited maintenance mode and waited for the replication to happen. And everything worked like charm until...

We tried to search this newly migrated index for period "All time" on the cluster. The other two nodes returned the warning such as:

 

[indexer02] Failed to read size=257 event(s) from rawdata in bucket='qualys~18~C221DE6A-20A8-41A8-A8D2-27E1F7A4B043' path='/opt/splunkcold/qualys/colddb/rb_1526515073_1526386468_18_A221DE6B-20A00-41A8-A8D2-27E1F7A4B043. Rawdata may be corrupt, see search.log. Results may be incomplete!

 

 

We tried with repairing the specific bucket with splunk rebuild command and reinitiated the replication, but the result was same. So we dug a little deeper and figured out that the rawdata folder, which should contain journal.gz, slicemin.dat and slicesv2.dat also contains weird plain-text non-compressed file with raw events. It is named with a number which doesn't tell much.

 

The question is the following - what is this file. It is located only on indexer01 and it is not being replicated to other nodes. Is there any way to append this file to journal.gz or to force the replication of this file as well.

Labels (1)
0 Karma
1 Solution

MaverickT
Communicator

After small amount of scripting and even greater amount of waiting we finally managed to get those slice files merged with journal.gz. We tried uncompressing journal.gz and just appending slice file, but it didn't work.  Final solution was to use "splunk cmd exporttool" and "splunk cmd importtool". Because the amount of bucket, that needed to be fixed was above 2500 we addapted the export/import script that was published splunkwiki years ago. I am sharing it here in case somebody needs it in future.

 

#!/bin/bash
:'
This is a bucket fixup script. It fixes the buckets with leftover slices by using export command.
It also renames buckets that are single-instance to cluster format (by adding instance GUID)

Author: Žiga Humar, Our Space Appliances
Author takes no responsibily for this script or for any data corruption it might cause.

Thanks to jrodman, whos script was good starting point. And to Christian Bran at Splunk Support who expained me the
logic about leftover slices. 
'

# EDIT YOUR VARIABLES HERE
BUCKET_TMPDIR=/tmp
SPLUNK_HOME=/opt/splunk
SPLUNK_BIN=/opt/splunk/bin/splunk
INSTANCE_GUID="C221DE6A-20A8-41B8-A8D2-27E1F7A4B0B8"
PATHS=(splunkhot splunkcold)
# VARIABLES FINISHED


EXPORT_CMD="$SPLUNK_BIN cmd exporttool"
IMPORT_CMD="$SPLUNK_BIN cmd importtool"



declare -a index_list
# select indexes to process 
for path in ${PATHS[0]};
do
	for index in /opt/$path/*;
	do
		index_name=$(basename $index)
		if [ -d $index ]  && [ ${index_name:0:1}  != "_" ] && [ $index_name != "audit" ]
		then
			index_list+=($index_name)
		fi
	done
done	

# loop trough hot and warm paths	
for path in ${PATHS[@]};
do
	echo "$(date) Processing path=/opt/$path/"
	# loop trought indexes
	for index in /opt/$path/*;
	do
		index_name=$(basename $index)
		
		# check if this index should be processes by this instance
		index_found=0
		for iteration_index in "${index_list[@]}"
		do
			if [ "$iteration_index" == "$index_name" ] ; then
				index_found=1
			fi
		done

		# if this is folder and if it should be processed keep on going
		if [ -d $index ]  && [ $index_found == 1 ]
		then
			echo "$(date) Processing index: $index_name"
			for bucket in $index/*/db_*;
			do
				bucket_dir=$(dirname $bucket)
				bucket_name=$(basename $bucket)	
				if [ -d $bucket ] && [ ${bucket_name:0:2}  == "db" ]
				then
					bucket_id_guid=$(echo $bucket_name | sed 's/db_[0-9]*_[0-9]*_//')
					bucket_guid=$(echo $bucket_id_guid | sed 's/[0-9]*//')
					bucket_id=$(echo $bucket_id_guid | sed 's/_[0-9a-Z\-]*$//')
					
					
					#echo "$(date) Checking bucket=$bucket_id index=$index_name"
					#echo "$(date) Guid: ${bucket_guid}" 					
					
					# If rawdata folder contains uncompressed slice file, do the export/import procesure
					if [ $(find $bucket -type f -regex '.*rawdata/[0-9]+$' | wc -l ) != "0" ] ; 
					then
						echo "$(date) FIXUP task for bucket=${bucket_id} index=$index_name required"
						echo "$(date) Exporting bucket=${bucket_id} index=$index_name"
						NEW_BUCKET=$BUCKET_TMPDIR/new_bucket_$index_$bucket_id
						EXPORTING_BUCKET=$BUCKET_TMPDIR/export_bucket_$index_$bucket_id.csv

						# delete old export files (just in case they are left from previous migration)
						rm -Rf $NEW_BUCKET
						rm -Rf $EXPORTING_BUCKET
					
						# do export
						SECONDS=0
						$EXPORT_CMD $bucket $EXPORTING_BUCKET -csv 
						EXPORT_ENDTIME=$(date +%s)
						duration_export=$SECONDS
						echo "Export took $duration_export seconds."
						
						# do import
						echo "$(date) Reimporting bucket=${bucket_id}  index=$index_name"	
						SECONDS=0						
						$IMPORT_CMD $NEW_BUCKET $EXPORTING_BUCKET
						duration_import=$SECONDS
						echo "Reimport took $duration_import seconds."
						
						# go into new bucket and get earliest and latest time in the bucket
						(cd $NEW_BUCKET; ls *.tsidx | sed 's/-[0-9]\+\.tsidx$//' |sed 's/-/ /') | {
							global_low=0
							global_high=0
							while read high low; do
								if [ $global_high -eq 0 ] || [ $high -gt $global_high ]; then
											global_high=$high
								fi
								if [ $global_low -eq 0 ] || [ $low -lt $global_low ]; then
									global_low=$low
								fi
							done
							REAL_BUCKET_NAME=db_${global_high}_${global_low}_${bucket_id}_${INSTANCE_GUID}
							
							# move the old bucket to temporary location
							if [ -d $bucket ]; 
							then
								mv $bucket $BUCKET_TMPDIR
							else
								echo >&2 bucket $bucket vanished while processing.. inserting new one and hoping for the best
							fi
							# replacing old bucket with a new one
							echo "Replacing bucket=${bucket_id} index=$index_name"		
							mv $NEW_BUCKET $bucket_dir/$REAL_BUCKET_NAME
						}
						
						# delete temporary export file and the old one.
						rm -rf $BUCKET_TMPDIR/$bucket_name # delete old one
						rm $EXPORTING_BUCKET # delete exported one
					
					# if bucket folder doesn't end with the INSTANCE_GUID, lets append it
					elif [ "$bucket_guid" != "_${INSTANCE_GUID}" ];
					then
						echo "$(date) Renaming bucket=${bucket_id} from single instance to cluster format index=$index_name" 
						mv $bucket ${bucket}_${INSTANCE_GUID}
					fi
				fi
			done
		fi
	done
done



 

Just for info: we had 30% of buckets in this state. The total size of all indexes is 3 TB. It took us 4 days to run the script. And after running it, there was also a couple of other buckets that were corrupted which had to be exported in same fashion, but it was fast and easy job.

View solution in original post

0 Karma

MaverickT
Communicator

Just an update: we are in contact with Splunk Support. This weird plain-text non-compressed file is a temporary slice file where fresh events are temporary written. When the slice is "full" it is appended to journal.gz. When buckets are being rotated from hot to warm, slice is merged with journal.gz. But in case of indexer crash, this slice is not merged with journal.gz so the bucket is left with journal.gz and the slice file.

Now we are working on a way how to merge this slices with journal.gz as effective as possible. Will keep you posted.

0 Karma

MaverickT
Communicator

After small amount of scripting and even greater amount of waiting we finally managed to get those slice files merged with journal.gz. We tried uncompressing journal.gz and just appending slice file, but it didn't work.  Final solution was to use "splunk cmd exporttool" and "splunk cmd importtool". Because the amount of bucket, that needed to be fixed was above 2500 we addapted the export/import script that was published splunkwiki years ago. I am sharing it here in case somebody needs it in future.

 

#!/bin/bash
:'
This is a bucket fixup script. It fixes the buckets with leftover slices by using export command.
It also renames buckets that are single-instance to cluster format (by adding instance GUID)

Author: Žiga Humar, Our Space Appliances
Author takes no responsibily for this script or for any data corruption it might cause.

Thanks to jrodman, whos script was good starting point. And to Christian Bran at Splunk Support who expained me the
logic about leftover slices. 
'

# EDIT YOUR VARIABLES HERE
BUCKET_TMPDIR=/tmp
SPLUNK_HOME=/opt/splunk
SPLUNK_BIN=/opt/splunk/bin/splunk
INSTANCE_GUID="C221DE6A-20A8-41B8-A8D2-27E1F7A4B0B8"
PATHS=(splunkhot splunkcold)
# VARIABLES FINISHED


EXPORT_CMD="$SPLUNK_BIN cmd exporttool"
IMPORT_CMD="$SPLUNK_BIN cmd importtool"



declare -a index_list
# select indexes to process 
for path in ${PATHS[0]};
do
	for index in /opt/$path/*;
	do
		index_name=$(basename $index)
		if [ -d $index ]  && [ ${index_name:0:1}  != "_" ] && [ $index_name != "audit" ]
		then
			index_list+=($index_name)
		fi
	done
done	

# loop trough hot and warm paths	
for path in ${PATHS[@]};
do
	echo "$(date) Processing path=/opt/$path/"
	# loop trought indexes
	for index in /opt/$path/*;
	do
		index_name=$(basename $index)
		
		# check if this index should be processes by this instance
		index_found=0
		for iteration_index in "${index_list[@]}"
		do
			if [ "$iteration_index" == "$index_name" ] ; then
				index_found=1
			fi
		done

		# if this is folder and if it should be processed keep on going
		if [ -d $index ]  && [ $index_found == 1 ]
		then
			echo "$(date) Processing index: $index_name"
			for bucket in $index/*/db_*;
			do
				bucket_dir=$(dirname $bucket)
				bucket_name=$(basename $bucket)	
				if [ -d $bucket ] && [ ${bucket_name:0:2}  == "db" ]
				then
					bucket_id_guid=$(echo $bucket_name | sed 's/db_[0-9]*_[0-9]*_//')
					bucket_guid=$(echo $bucket_id_guid | sed 's/[0-9]*//')
					bucket_id=$(echo $bucket_id_guid | sed 's/_[0-9a-Z\-]*$//')
					
					
					#echo "$(date) Checking bucket=$bucket_id index=$index_name"
					#echo "$(date) Guid: ${bucket_guid}" 					
					
					# If rawdata folder contains uncompressed slice file, do the export/import procesure
					if [ $(find $bucket -type f -regex '.*rawdata/[0-9]+$' | wc -l ) != "0" ] ; 
					then
						echo "$(date) FIXUP task for bucket=${bucket_id} index=$index_name required"
						echo "$(date) Exporting bucket=${bucket_id} index=$index_name"
						NEW_BUCKET=$BUCKET_TMPDIR/new_bucket_$index_$bucket_id
						EXPORTING_BUCKET=$BUCKET_TMPDIR/export_bucket_$index_$bucket_id.csv

						# delete old export files (just in case they are left from previous migration)
						rm -Rf $NEW_BUCKET
						rm -Rf $EXPORTING_BUCKET
					
						# do export
						SECONDS=0
						$EXPORT_CMD $bucket $EXPORTING_BUCKET -csv 
						EXPORT_ENDTIME=$(date +%s)
						duration_export=$SECONDS
						echo "Export took $duration_export seconds."
						
						# do import
						echo "$(date) Reimporting bucket=${bucket_id}  index=$index_name"	
						SECONDS=0						
						$IMPORT_CMD $NEW_BUCKET $EXPORTING_BUCKET
						duration_import=$SECONDS
						echo "Reimport took $duration_import seconds."
						
						# go into new bucket and get earliest and latest time in the bucket
						(cd $NEW_BUCKET; ls *.tsidx | sed 's/-[0-9]\+\.tsidx$//' |sed 's/-/ /') | {
							global_low=0
							global_high=0
							while read high low; do
								if [ $global_high -eq 0 ] || [ $high -gt $global_high ]; then
											global_high=$high
								fi
								if [ $global_low -eq 0 ] || [ $low -lt $global_low ]; then
									global_low=$low
								fi
							done
							REAL_BUCKET_NAME=db_${global_high}_${global_low}_${bucket_id}_${INSTANCE_GUID}
							
							# move the old bucket to temporary location
							if [ -d $bucket ]; 
							then
								mv $bucket $BUCKET_TMPDIR
							else
								echo >&2 bucket $bucket vanished while processing.. inserting new one and hoping for the best
							fi
							# replacing old bucket with a new one
							echo "Replacing bucket=${bucket_id} index=$index_name"		
							mv $NEW_BUCKET $bucket_dir/$REAL_BUCKET_NAME
						}
						
						# delete temporary export file and the old one.
						rm -rf $BUCKET_TMPDIR/$bucket_name # delete old one
						rm $EXPORTING_BUCKET # delete exported one
					
					# if bucket folder doesn't end with the INSTANCE_GUID, lets append it
					elif [ "$bucket_guid" != "_${INSTANCE_GUID}" ];
					then
						echo "$(date) Renaming bucket=${bucket_id} from single instance to cluster format index=$index_name" 
						mv $bucket ${bucket}_${INSTANCE_GUID}
					fi
				fi
			done
		fi
	done
done



 

Just for info: we had 30% of buckets in this state. The total size of all indexes is 3 TB. It took us 4 days to run the script. And after running it, there was also a couple of other buckets that were corrupted which had to be exported in same fashion, but it was fast and easy job.

0 Karma
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...