Deployment Architecture

Splunk v5 Clustering and HA

Drainy
Champion

I have a scenario and was wondering if somebody could confirm what would happen...

Lets pretend we're the Federation, obviously we have a lot of data across the galaxy but we're really quite interested in our local star system.
To that end we've decided to install Splunk v5 (they obtained it after a time travelling related accident). So on the Enterprise we install an indexer, Excelsior also has an Indexer, Spacedock (in orbit of earth) has two indexers (its generating quite a lot of data).
Finally, on the moon we have another two indexers.
On earth we have a couple of search heads dotted around. Because we've installed v5 we decide to to setup a cluster to make sure we have HA across the fleet, the master is located on a dedicated terminal on the Spacedock.

All works well for a week when suddenly, a giant cylindrical thing with a giant floating ball and crazy whale noises appears, no one has any idea whats going on when suddenly Spacedock loses all power.
Our master and two indexers are taken offline.

According to the docs the search heads will continue to try and search across its previously known indexers, so in this case HA has actually failed and we don't have any redundancy from entire site failures if the master is located on that site. Is that correct? Is there any way to mitigate or protect against this? (Short of sticking the master on a satellite)

Thanks for any opinions or views, the more the merrier.

1 Solution

Vishal_Patel
Splunk Employee
Splunk Employee

I'll try to summarize your question to make sure I have it right. If my master and 2 peers fail in a cluster with replication factor=3, what will happen.

In this case, the cluster won't be able to take corrective action to recover until the master (original or a new one) is brought back into the cluster. Although there are some plans of making master redundant in future release, in 5.0, there is no notion of multiple masters. However, one nice property of the 5.0 master is that it persists no data, if your master completely blows up, you just have to stand up a separate machine configured w/ the exact same clustering stanza in server.conf, and as long as the master_uri from the peer's/searchead's point of view doesn't change, the new master will be able to reconstruct state once all peers have registered themselves against it. This fact can be used to set up a fail-over master node w/ dns/virtual ip tricks; this of course is not first class support for master redundancy, but may be a suitable work around for some folks

View solution in original post

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...