What is the best practice for distributed data across 2 or more data-centers?
The objective is geographic redundancy. i.e. To survive the loss of 1 data-center.
The ideal would seem to be:
This doesn't seem to be supported, at least as of R6.0.
Instead, the options seem to be:
Option 1 isn't practical because the replication factor needs to be > the number of indexers at any one data center - to be sure that at least one copy is off site.
Option 2 isn't great either, because you lose the benefits of clustering.
What then is the best practice recommendation?
I would like to know that as well as I haven't found a great option so far. I was under the impression that Splunk will introduce failure zone awareness with splunk 6, but that doesn't seems to have happened(?!). Until failure zone awareness is available natively, I'm would rather go with multi-cluster setup with external replication and hot-standby.