Afternoon Splunk Community,
I'm currently in charge of helping sunset an old subsidiary of ours and putting their infrastructure out to pasture. As part of the decommission process I need to backup indexed data from their Splunk instance for long term retention for the purpose of restoring the data should we ever need to view it for any reason.
The Splunk indexing tier consists of six total indexers across two sites, three indexers in site 1 and the other three indexers in site 2. The cluster master for the site has a replication factor of "origin:2 total:3" which means that each indexer in the site should contain two copies of each bucket that originated within the site, and a single copy of each bucket that did not originate within the site.
In an ideal world I think this would mean that I should only need to backup the data volume of a single indexer in the cluster to have a copy of all indexed data. However I've read that in taking a backup of a clustered indexer that there is no guarantee that a single indexer contains all data within the cluster, even with replication enabled.
I have several questions:
Resources:
Splunk Community - How to backup/restore Splunk db to new system
1. True. Since any bucket can be (assuming that the cluster is complete) on "any" three of your six indexers, there's no guarantee that any single indexer contains a copy of each bucket. And in fact it shouldn't - that would mean you have skewed data distribution which you don't want.
2. There are several approaches you could take depending on the resources you want to use and the goal. The easiest might be to just make backups of the whole machines using any bare-metal-restore backup solutions you use and in case of sudden need just restore whole environment from scratch. There is one obvious downside (storage use) and one caveat - if you spin up a new environment and restore it from such backup - your buckets will most probably quickly (immediately? depends on the period of inactivity and your retention settings) roll to frozen.
On the other hand - if you want to minimize the amount of data stored, you could indeed go across all the buckets and deduplicate them so that you only backup single copy of each bucket. You could even set coldToFrozenDir and lower your retention period to a minimum so that splunk moves data to frozen first, removing unnecessary (for the backup purposes) indexes, leaving just raw data, indexed fields and such "irrecoverable" stuff. Then do deduplication and backup. Such reduced buckets must be rebuilt before they can be used again.
Of course you could just search across all your indexes and export the data. It has its upsides (you can use such data with any software you want), it has its downsides (if you want to use that data with splunk again you'd have to re-ingest it which might not be so easy if you want to achieve the exactly same results as you had before exporting; and of course re-ingesting the data would be time-consuming and license-consuming).
3. Backup the configs Backup the contents of kvstores on your SH(C). You might want to consider whether you need state files for modular inputs so that if for some reason you'd like to recreate your inputs, they'd not start ingesting the same data all over again.
As you said, there's no guarantee that one indexer will have a copy of all data. Nor can you rely on the name of each bucket to determine which are primaries so only those are backed up. I haven't done it myself, but consider creating a script that compares each indexer's GUID to that in the bucket name and back up only those that match.
Be sure to roll hot buckets to warm before backing them up.
When restoring a backup, put the indexes in the thawed directory (and follow the procedure for doing so) rather than back where they came from. Using thawed prevents old data from being frozen before you can search for it.
It's a good idea to back up $SPLUNK_HOME/etc. You'll need that to restart the indexers.