Afternoon Splunk Community,
I'm currently in charge of helping sunset an old subsidiary of ours and putting their infrastructure out to pasture. As part of the decommission process I need to b...
See more...
Afternoon Splunk Community,
I'm currently in charge of helping sunset an old subsidiary of ours and putting their infrastructure out to pasture. As part of the decommission process I need to backup indexed data from their Splunk instance for long term retention for the purpose of restoring the data should we ever need to view it for any reason.
The Splunk indexing tier consists of six total indexers across two sites, three indexers in site 1 and the other three indexers in site 2. The cluster master for the site has a replication factor of "origin:2 total:3" which means that each indexer in the site should contain two copies of each bucket that originated within the site, and a single copy of each bucket that did not originate within the site.
In an ideal world I think this would mean that I should only need to backup the data volume of a single indexer in the cluster to have a copy of all indexed data. However I've read that in taking a backup of a clustered indexer that there is no guarantee that a single indexer contains all data within the cluster, even with replication enabled.
I have several questions:
What is the best practice for archiving indexed data for potential restoration into another Splunk instance at a later date?
Each of my indexer's "/var/lib/splunk" directory symlinks to a unique AWS EBS volume. Should I simply retain the AWS EBS volume from each of my six indexers, one indexer from each of my two sites, or can I retain a single EBS volume from one of my indexers and discard the rest? My thought process behind this method is that if for any reason I ever needed to restore the archived data for searching I could simply setup a new Splunk indexer, attach the archived EBS volume, and point a search head to the new indexer in order to search the data.
Is a better approach to simply stop splunkd on my indexer, create an archive of /var/lib/splunk using an archive utility and then restoring that archive to /var/lib/splunk on a new indexer at a later date if we need to restore the data for any reason?
As a last resort, I could run a search against each of my indexes and export this data in a human readable format (.csv, XLS...etc.).
I've already checked in all of my knowledge artifacts (apps, configuration files..etc.) in Git for record keeping. However, Is there any other portions of my Splunk deployment that I should consider backing up? For example, should I retain a copy of my cluster master's data volume for any reason?
If I do need to restore archived data, can I restore the archived data to a standalone indexer, or do I need to restore the data to an indexer cluster? This one seems rhetorical to me, but I figured I should ask none the less.
Resources:
Splunk - Back up Indexed Data
Splunk Community - How to backup/restore Splunk db to new system