Deployment Architecture

High availability: Splunk cluster across two datacenters - redux

Path Finder

What is the best practice for distributed data across 2 or more data-centers?

The objective is geographic redundancy. i.e. To survive the loss of 1 data-center.

The ideal would seem to be:

  • forwarder send events to the nearest data center - or load share across multiple data-centers
  • indexed data is replicated so that each data center has a complete set of data (or for >2 data centers, data is replicated in such a way that a single data center only has one copy of each block of data)

This doesn't seem to be supported, at least as of R6.0.

Instead, the options seem to be:

  1. have a single cluster with a replication factor high enough that every block is guaranteed to exist at >1 data center.
  2. have one cluster per data-center and forward all events to all data centers

Option 1 isn't practical because the replication factor needs to be > the number of indexers at any one data center - to be sure that at least one copy is off site.

Option 2 isn't great either, because you lose the benefits of clustering.

What then is the best practice recommendation?


I would like to know that as well as I haven't found a great option so far. I was under the impression that Splunk will introduce failure zone awareness with splunk 6, but that doesn't seems to have happened(?!). Until failure zone awareness is available natively, I'm would rather go with multi-cluster setup with external replication and hot-standby.