Getting Data In

Frozen data deletion script - deletes older than 12 months data

Builder

Hello guys,

I've created a shell script, scheduled with cron-like software, which deletes 12+ months data except for one special index (13 months), could someone gives comment on it?

Calling method : ./script.sh /var/frozen/ 2

Note : if you use it, at your own risk 🙂

Thanks.

#!/bin/bash
# Frozen buckets detection

#$1 = first arg : main directory
#$2 = second arg : depth

cd ~

curDateEpo=`date +%s`;
curDate=`date -d @$curDateEpo`;

beforeDateEpo=`expr $curDateEpo - 31536000`
# Special 13 months 10/09/2019
beforeDateEpoSpecial=`expr $curDateEpo - 34165800`

#echo "Current Epoch time: $curDateEpo";

#echo $(whoami)


#echo "Before 12 months : $beforeDateEpo";

for i in `find $1 -maxdepth $2 -type d`
do

read latest earliest <<<${i//[^0-9]/ };

earliest=${earliest:0:10}

if [[ $earliest =~ ^[0-9]+$ ]] ;
        then

        if [ "$earliest" != "" ]
        then

        earliestH=`date -d @$earliest +%Y/%m/%d`;

        # Compare

        # Case Special (CASE SENSITIVE!!!)
        if [[ $i == *"ppr_app_special/"* ]]
        then
        beforeDateEpo=$beforeDateEpoSpecial
        printf "*****  special detected in $i so applying $beforeDateEpo period *****\n" >> splunk_frozen_script.log;
        #else
        #printf "special not detected in $i so applying $beforeDateEpo period\n" >> splunk_frozen_script.log;
        fi


                if [ "$earliest" -lt "$beforeDateEpo" ]
                then
                    # Splunk format
                    #printf "$curDate;$i;$earliestH\n";

                    ### PURGE ###
                    rm -r $i
                    printf "$curDate;$i;$earliestH;DELETED\n" >> splunk_frozen_script.log;
                    echo "Some buckets have been deleted, logged in splunk_frozen_script.log";


                fi
        fi
fi


done

printf "$curDate;$curDate;EXEC_FINISHED\n" >> splunk_frozen_script.log;

Explorer

Here is a script I created that should do what you are asking! It also gives you the ability to delete from all indexes on a frozen path. The initial script came from seigex on Reddit: https://www.reddit.com/r/Splunk/comments/86lxao/script_to_delete_frozen_data_based_on_epoch_time/

The expected frozen path is {frozen-path}/{index-name}/{bucket_folder}. Be sure to insert your own log directory file and frozen path

#/bin/bash
#Script Master: Powers64
#Schedule: Cronjob is set to run this script @ ...
#Purpose: Check for db in each index in frozen storage that is older then needed retention date based on epoch, then delete if found

TODAY=`date`
#useful if want to monitor and create a dashboard from the logs, comma-delimited.
CLEAN_LOG=[.../removed_frozen_bucket_by_epoch.log]

#Offline retention is set to 1 year
RETENTION_BY_EPOCH=`date --date="1 years ago" +%s`

echo "Retention is set to: " $RETENTION_BY_EPOCH
SPLUNK_FROZEN_PATH="[/frozen/mount/.../splunk_bucket_backups/frozen_buckets]"

##Pulls list of all index folders
cd $SPLUNK_FROZEN_PATH
for line in $(ls -d */ -1 | cut -f1 -d'/'); do

    cd $SPLUNK_FROZEN_PATH/$line/
#Checks if the folder is empty
    if [ "$(ls -A $SPLUNK_FROZEN_PATH/$line/)" ];
    then
#checks all db folders, avoids touching inflight folders
            for d in db_* ; do

                    START_EPOCH="$(cut -d'_' -f3 <<<$d)"
                    END_EPOCH="$(cut -d'_' -f2 <<<$d)"
                    BUCKET_NUM="$(cut -d'_' -f4 <<<$d)"
                    BUCKET_SIZE="$(du -ch $SPLUNK_FROZEN_PATH/$line/$d | grep total | cut -b 1-4)"

                    if [ $END_EPOCH \< $RETENTION_BY_EPOCH ] &&  [ $START_EPOCH != 0 ];
                    then
                            echo "the following bucket will be deleted: " $SPLUNK_FROZEN_PATH/$line/$d
                            echo -e "$TODAY,index=$line,bucket_folder=$d,bucket_num=$BUCKET_NUM,bucket_size=$BUCKET_SIZE,earliest_epoch=$START_EPOCH,latest_epoch=$END_EPOCH,set_retention=$RETENTION_BY_EPOCH" >> $CLEAN_LOG_DIR
                   rm -rf $SPLUNK_FROZEN_PATH/$line/$d
                    fi
            done
    fi
done
echo "Frozen Storage now adheres to retention policy!"

Engager

Hello @Powers64,

Thank you for sharing this script.

I have a question please, I'm a bit confused by why should we specify {frozen-path}/{index-name}/{bucket_folder} as the path if the script is deleting from all indexes and not a single one ?

I see that the this line ls -d */ -1 | cut -f1 -d'/' prior to the loop in your script allows listing of all indexes.

Can you please confirm that path={frozen_path} (where all indexes with frozen buckets are) ? 

Thank you very much for your support.

Regards.

0 Karma

SplunkTrust
SplunkTrust

Any specific reason for not utilizing Splunk's native data retention settings?

0 Karma

Builder

Hi, there is no native frozen data deletion script, or am I wrong? In our case we want 6 monhs online and 6 months archived for most indexes.

0 Karma

SplunkTrust
SplunkTrust

The default behavior of Splunk is to delete the frozen buckets (no auto archive), if you're keeping data searchable for say 12 months, if you set the retention period (frozenTimePeriodInSecs in indexes.conf), the data will be searchable for 12 months and then it'll start deleting those older buckets which have crossed the retention period. This can be set at global level (so it applies to all indexes) and you can override it at index level (for that one exception index). Again, This works for case where you want data to be searchable for whole retention period.

When you say you archieve data for 6 month, means you keep the frozen buckets for 6 months and then delete them using this script?

Builder

yes exactly

0 Karma

SplunkTrust
SplunkTrust

You're doing custom thing so probably custom script is the best idea here. One final thing, you archive after 6 month so the data is not searchable for after 6 month, they're just kept there so that it can be restored if required. Will there be a problem if they're searchable? If you're looking for cost saving by reducing the bucket size, that can be achieved by tsidx reduction (depending upon which version of Splunk are you using).

Builder

That's good question, we have different partitions and data are splitted : HOT-WARM/COLD/FROZEN.

Is the script technically correct in your opinion?

Thanks 🙂

0 Karma