topic Re: Backfill automated bash script timeout: Is there a best practice on how much data can be backfilled per thread/search? in Knowledge Management

Backfill automated bash script timeout: Is there a best practice on how much data can be backfilled per thread/search?

Powers64 — Fri, 10 Jun 2016 18:26:27 GMT

I have created a bash script to assist with automation of backfilling missing data and to avoid overloading the server. However, at times when I increase threads and the time space of the search, some backfills are skipped due to an error. From the settings below (within script) is there a best practice on how much data can be backfilled per thread/search?

#!/bin/bash
#This script is intended to backfill splunk data that is already ingested into splunk but searches that failed due to license issues (or other phenomenon).

#Timestamp used for logs
_now=$(date +"%Y-%b-%d_%Hh_%Mm_%Ss")

############################
#Required Information Needed
############################
#Splunk path
splunk_dir=/opt/splunk/bin
#Log Path
log_dir=/opt/scripts/logs
#Splunk Username (not linux username) to run backfill script under
username=Powers64
#Name of Application the search resides in
app="SmartyPants"
#This needs to be type again manually down below. If search has - in it's name the script will not run.
search_name="'Summary - SmartyPants - 5 minutes'"
#Search Earliest EPOCH time
et=1463247600
#Search Latest EPOCH time
lt=1463605200
#How often does the search run? [In Seconds]
seconds=300
#Max Backfill Queries in every Search
maxq=10
#When this option is set to true, the script does not run saved searches for a scheduled timespan if data already exists in the summary index for that timespan.
dedup="true"
#Specifies that the summary indexes are not on the search head but are on the indexes instead. To be used in conjunction with -dedup
nolocal="true"
#Maximum number of concurrent searches to run
concurrent=2
####
#For more info on managing backfill visit http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Managesummaryindexgapsandoverlaps
####
############################
#End of required Information
############################

echo "Please enter username's password (Note: Password is invisiable, just press [Enter] after typed): "
read -s password

cd $splunk_dir
#Total queries run within 1 backfill search
queries=$(($seconds*$maxq))
#Calculates the last run if not dividable by queries ran
remaintime=$((($lt-$et)%$queries))

#Runs a recurring backfill search based on parameters above
for ((current=$et; current<$lt; current=current+$queries))
do

#Calculates remaining seconds to run. Identifies when to run last backfill search
lastrun=$(($lt-$current))

        if [ $lastrun != $remaintime ]
            then
                qrun=$(($current+$queries))
                completed=$(((($current-$et)*100)/($lt-$et)))
                echo "Running backfill from" $current "to" $qrun
                ./splunk cmd python fill_summary_index.py -app $app -name 'Summary - SmartyPants - 5 minutes' -et $current -lt $qrun -dedup $dedup -nolocal $nolocal -showprogress true -j $concurrent -auth $username:$password 2>&1 | tee $log_dir/$_now.output
                echo $_now "-" $app $search_name $current $qrun >> $log_dir/backfill_history.log
                echo $completed"% Complete - Surpressing script for 15 seconds to avoid overloading server"
                sleep 15
            else
                echo "Running last backfill from" $current "to" $lt
                ./splunk cmd python fill_summary_index.py -app $app -name 'Summary - SmartyPants - 5 minutes' -et $current -lt $lt -dedup $dedup -nolocal $nolocal -showprogress true -j $concurrent -auth $username:$password 2>&1 | tee $log_dir/$_now.output
                echo $_now "-" $app $search_name $current $lt >> $log_dir/backfill_history.log
                echo "100% Complete - Backfill completed! Yippy"
        fi

done

Re: Backfill automated bash script timeout: Is there a best practice on how much data can be backfilled per thread/search?

ddrillic — Sun, 12 Jun 2016 01:00:28 GMT

From where does the fill_summary_index.py python script come from?

Re: Backfill automated bash script timeout: Is there a best practice on how much data can be backfilled per thread/search?

Raghav2384 — Tue, 29 Sep 2020 09:56:31 GMT

First, script looks very good. It gives a lot of options to pick....I do a lot of backfilling myself based on search names/app etc

i have modified the fill_summary_index.py to suit the best for a Search head clustering environment and pick a time when the schedules are minimum. The option j cannot exceed the number of cores for the search head (I can put a 1000 in there but if my machine has only 16 cores, 16 searches is all it can run at any give time (concurrently))

I typically rely heavily on dedup -true as that's not going to harm the performance (It simply does not execute if the job has already run).

That being said, there's no backfilling best practice perse . I however pick a list f searches that have a lot in common (Example: schedule time ranges) If i have 10 searches that run in 15 min difference, i will pick a -et,-lt to cover the search window for all the 10 and use -dedup true true to ignore the ones that already ran.

./splunk cmd python fill_summary_index.py -app search -name "All the crazy summaries" -dedup true -showprogress true -j 16 (That's all my search head can handle, 16 searches concurrently) -owner admin -auth admin:admin.

Since the script you wrote covers everything...only way to overcome performance is , run few summary backfills from a different SHC member (If you have search head clustering) or even pooling.....Reason i had to edit fill_summary_index.py is i do not store any summary data on Search heads and i forward everything rom SHC to Indexers.

Hope this helps!

Thanks,
Raghav

Re: Backfill automated bash script timeout: Is there a best practice on how much data can be backfilled per thread/search?

Powers64 — Tue, 29 Sep 2020 09:56:39 GMT

Raghav2384, thanks for the reply. I noticed that when I try to backfill a search jobs that runs every 5 minutes with over 100k events per search it will error out if I use a wide back fill time window. On the other hand when I run a backfill on a search jobs that runs every hour with 9k events per search with a very large window of backfill it has no issue.

I figured there is a limitation on how many events per search job can be backfilled.

As for your change on the fill_summary_index.py, there is -nolocal argument that "Specifies that the summary indexes are not on the search head but are on the indexes instead. To be used in conjunction with -dedup"

Re: Backfill automated bash script timeout: Is there a best practice on how much data can be backfilled per thread/search?

Powers64 — Mon, 13 Jun 2016 12:53:58 GMT

It is a splunk script to backfill data generated from running search jobs.
http://docs.splunk.com/Documentation/Splunk/6.4.1/Knowledge/Managesummaryindexgapsandoverlaps

Re: Backfill automated bash script timeout: Is there a best practice on how much data can be backfilled per thread/search?

woodcock — Mon, 13 Jun 2016 20:10:56 GMT

This looks pretty good so if you are still looking for performance/safety improvements, I suggest that you convert from using SI to using accelerated data-models + tstats.