I've come up with a plan to backup my Splunk v6.6.1 indexes that I'd like to be reviewed.
I don't have a big Splunk install: Just 3 physical indexers with 8-TB local disk each configured in an indexer cluster with a VM for the index master.
My data rates are not that big, but I need to preserve hundreds of indexes against a possible corruption for several years.
Replication Factor protects me against a hardware failure, but does not protect me against an Operator Error: somebody deletes an index or uploads too much data into an index where it was not supposed to go. So I need to backup a copy of my indexes so that I can get a copy of an index as it was one week ago for example.
The hot bucket backup problem is not too much of a problem for my application since I don't have high data input rates. I can always force a roll from hot to warm before the once-a-day backup, or even ignore/lose the hot bucket altogether.
I've setup maxHotBuckets = 1 to ensure that I lose at most 1 hot bucket.
That way, I avoid setting up a ZFS filesystem on my indexers. I just use regular filesystems.
So my index data primary copy is going to be split across 3 different machines on their local disk. Also, since I run with a Replication Factor = 3 and Search Factor = 2 I will have multiple copies of the data (db_* and rb_*) mixed together in the same directories. Writing a specialized backup script sounds like a complex (re-assemble a full copy of all the primary buckets in one index?) and fragile (how does the script gets update when I add one more indexer to my cluster?) task. So I'm ruling out writing a custom script for backups.
I am relying on Splunk's internal replication mechanism to assemble for that copy of all primary buckets in one place.
Instead of a simple single cluster, I've added another VM with some NFS disks (NFS disks can be backed-up easily). M(y indexer cluster is configured in a multisite indexer cluster:
site1 / MAIN:
indexmaster: VM, RF=3, SF=2
indexsvc1: Physical with 8-TB local disks
indexsvc2: Physical with 8-TB local disks
indexsvc3: Physical with 8-TB local disks
site2 / BKP
indexbkp: VM, single machine indexer with 2 x 6-TB NFS partitions (backup1 & backup2) attached
and in the server.conf of the multisite cluster master, I add the following:
On the site1/main indexer cluster, those backup1 & backup2 paths correspond to a plain directory under SPLUNK_DB.
On the site2/bkp indexer, those backup1 & backup2 correspond to each of the NFS partitions enabling me to share my indexes into the different NFS partitions for backup.
The search heads use site-affinity to just use site1's indexer cluster.
This setup gives me a multisite cluster with two sites: one with performance & capacity, and one which is used only for replicating a single copy of all the indexes.
I have some thoughts about your configuration.
First of all NFS is not suggested by Splunk for using on the Indexers. If possible use local storage or reliable storage subsystem.
In your setup you will only backup the replicated buckets, but you will never get the master buckets. So in your backup strategy you miss the original data. (Sitenote: We had some issues with replicated bucket when they were made searchable. Especially with buckets which contained Windows event logs)
Another thought regarding Mulitsite Clustering Backup. If you backup all the data from each indexer you probably will backup more data as need. I would backup only the original buckets (db_) and write a blacklist for the replicated buckets (rb_). Why? The cluster will automatically replicate the original bucket if the clustermaster doesnt find any copies