Deployment Architecture

Multisite Indexer & Search Head Clustering: Are there any other benefits of having 6 vs 4 replicas of our data across 9 indexers and 3 sites?

SplunkTrust
SplunkTrust

Hi all,

We are struggling to get our Splunk architecture proposal approved by our internal review process and one of the questions they keep raising is about our data resiliency plan.

This is our proposal on a very high level:

  • 3 sites (America, EMEA, APAC)
  • 1 x Search Head + 3 x Indexers per site
  • Search Replication Factor: 2, 2, 2 (6 copies of our data globally, 2 per site)
  • Splunk 6.3 with both multisite clustering and search head clustering

Our review board is suggesting why not 4 copies instead of 6, as in 2, 1, 1 (origin 2, 4 in total)?
Reasons to keep 6 copies of our data across 3 sites and 9 indexers.

PROS:

  • Cross-site resiliency -> one or two sites can be down at any single time and all our data would still be searchable from the third one
  • Intra-site resiliency -> We can afford to have one indexer down per site

CONS:
* Cost (33% more storage required) and extra network bandwidth

Are there any other benefits of having 6 copies of our data instead of 4 that I can use to justify the extra cost?
Performance maybe? Anything else?

Thanks,
J

0 Karma
1 Solution

Splunk Employee
Splunk Employee

@javiergn The advantage of having two copies at each site, as you noted, is that you can have 2 remote sites completely down AND sustain a single local indexer failure. However, you have to ask yourself, what is the likelihood of loosing two remote datacenters AND a local indexer?

With the configuration of 2 local copies and two remote copies, you could sustain an indexer failure at any of your sites and still maintain searchability. The difference is that if an indexer fails at a site that originated the bucket, Splunk can re-replicate (bucket fix-up) the data intra-site vs. at a remote site, we'd have to reach across the WAN to re-replicate the data.

You're not going to get any peformance benefits from having multiple copies at each site because only a single copy of each bucket is primary (searchable) at each site....regardless of how many copies there are. The only potential performance gain is that bucket fix-up will happen faster because we're not replicating data across the WAN for sites that only have 1 copy of each bucket.

Make sense?

View solution in original post

Splunk Employee
Splunk Employee

@javiergn The advantage of having two copies at each site, as you noted, is that you can have 2 remote sites completely down AND sustain a single local indexer failure. However, you have to ask yourself, what is the likelihood of loosing two remote datacenters AND a local indexer?

With the configuration of 2 local copies and two remote copies, you could sustain an indexer failure at any of your sites and still maintain searchability. The difference is that if an indexer fails at a site that originated the bucket, Splunk can re-replicate (bucket fix-up) the data intra-site vs. at a remote site, we'd have to reach across the WAN to re-replicate the data.

You're not going to get any peformance benefits from having multiple copies at each site because only a single copy of each bucket is primary (searchable) at each site....regardless of how many copies there are. The only potential performance gain is that bucket fix-up will happen faster because we're not replicating data across the WAN for sites that only have 1 copy of each bucket.

Make sense?

View solution in original post

SplunkTrust
SplunkTrust

Hi, thanks for your quick response.

I guess I'm also concerned about the maintenance implications.

Each search head will only search locally because of the multisite clustering. Therefore if we only keep one single copy of our data per site, that is, across the 3 local indexers, if I want to patch an indexer or there's an unexpected failure, that will invalidate the whole site completely because the other two remaining indexers only have 66% of our data on average.

Therefore any search still running there will get compromised as the outcome won't be reliable anymore. Same for scheduled searches.

Is that a valid assumption?

0 Karma

Splunk Employee
Splunk Employee

If you're planning on deploying multi-site index clustering with Search Head affinity (where a SH only uses the local indexers for searches) and you have a local failure or bring down a local indexer, the Search Head will automatically reach out to another site to fulfill search requests if there are no searchable buckets locally. So even for sites with a RF:1, SF:1, you can still fulfill search results by reaching out to another site until we can fix-up the local buckets.

In 6.3, we also introduced the ability to turn off Search Head affinity so that all indexers across all sites participate in searches. This obviously requires that you have decent bandwidth and low latency between sites.

0 Karma