Deployment Architecture

Splunk Backup of indexed data on S3 buckets

Motivator

Hi Splunkers,

I have a clustered environment in aws infrastructure. I need to backup my data on daily basis to ensure high availability of data.
I have created a backup app on indexers which schedules a script to run on daily basis which takes backup of my buckets in splunk and copy it on my S3 bucket in aws.
How to ensure the backup of data taken has no duplicate buckets?

Currently I am using this script.

#Roll hotbucket
for i in `ls /opt/splunk/var/lib/splunk -I “*.dat” -I “_*”` ; do /opt/splunk/bin/splunk _internal call /data/indexes/$i/roll-hot-buckets -auth admin:password ; done

# incremental backup:
for i in `ls /opt/splunk/var/lib/splunk -I "*.dat" -I "_*" -I "authDb" -I "persistent*" -I "hashDb" -I "kvstore"` ; do for j in `ls /opt/splunk/var/lib/splunk/$i/db | grep db_` ; do aws s3 sync /opt/splunk/var/lib/splunk/$i/db/$j s3://splunk-databackup/$i/$j >> /opt/splunk/etc/apps/backup/logs/backup_output.log ; done ; done
for i in `ls /opt/splunk/var/lib/splunk -I "*.dat" -I "_*" -I "authDb" -I "persistent*" -I "hashDb" -I "kvstore"` ; do for j in `ls /opt/splunk/var/lib/splunk/$i/colddb | grep db_` ; do aws s3 sync /opt/splunk/var/lib/splunk/$i/colddb/$j s3://splunk-databackup/$i/$j >> /opt/splunk/etc/apps/backup/logs/backup_output.log ; done ; done

How do I ensure that there is no data duplication when the backup runs each time?

0 Karma