Deployment Architecture

Is backing up data necessary in an indexer cluster?


Since in an indexer cluster the data is replicated to other indexers, is backing up the data to somewhere else strictly necessary? I'm already backing up the configuration data in $SPLUNK_HOME/etc/ daily, but not sure if I should also implement indexed data backups, because to me it seems that the replication could be enough. And a complete failure of the cluster doesn't seem likely since it's hosted in AWS (famous last words, I know). The section on backing up indexed data in docs isn't conclusive either.

Any thoughts and experiences with this?

Depends on the recovery requirements you have and the amount of risk you're willing to accept I guess.

Replication covers several data loss risks, but not all. As you already mentioned: loss of multiple indexers (or underlying AWS infrastructure) (beyond your replication factor) can still lead to data loss, as can logical corruption / deletion or so due to some Splunk bug / mistake by an admin (or a malicious act).

In the end it should be a balance between the cost of a full backup, vs. the risk (likelihood * impact) of data loss. And that is a decision only you can make. If you just use splunk to monitor some IT systems, the situation is very different from when you use it to make critical business decisions, or store security/compliance logs that you are obligated by regulations to keep for x amount of time.

