I have a scenario and was wondering if somebody could confirm what would happen...
Lets pretend we're the Federation, obviously we have a lot of data across the galaxy but we're really quite interested in our local star system.
To that end we've decided to install Splunk v5 (they obtained it after a time travelling related accident). So on the Enterprise we install an indexer, Excelsior also has an Indexer, Spacedock (in orbit of earth) has two indexers (its generating quite a lot of data).
Finally, on the moon we have another two indexers.
On earth we have a couple of search heads dotted around. Because we've installed v5 we decide to to setup a cluster to make sure we have HA across the fleet, the master is located on a dedicated terminal on the Spacedock.
All works well for a week when suddenly, a giant cylindrical thing with a giant floating ball and crazy whale noises appears, no one has any idea whats going on when suddenly Spacedock loses all power.
Our master and two indexers are taken offline.
According to the docs the search heads will continue to try and search across its previously known indexers, so in this case HA has actually failed and we don't have any redundancy from entire site failures if the master is located on that site. Is that correct? Is there any way to mitigate or protect against this? (Short of sticking the master on a satellite)
Thanks for any opinions or views, the more the merrier.
I'll try to summarize your question to make sure I have it right. If my master and 2 peers fail in a cluster with replication factor=3, what will happen.
In this case, the cluster won't be able to take corrective action to recover until the master (original or a new one) is brought back into the cluster. Although there are some plans of making master redundant in future release, in 5.0, there is no notion of multiple masters. However, one nice property of the 5.0 master is that it persists no data, if your master completely blows up, you just have to stand up a separate machine configured w/ the exact same clustering stanza in server.conf, and as long as the master_uri from the peer's/searchead's point of view doesn't change, the new master will be able to reconstruct state once all peers have registered themselves against it. This fact can be used to set up a fail-over master node w/ dns/virtual ip tricks; this of course is not first class support for master redundancy, but may be a suitable work around for some folks