Deployment Architecture

Multisite replication factor issues

jpillai
Explorer

Hi all,

I'm testing multisite indexer clustering with below configuration and found an undesired behaviour in the case of a site failure.

available_sites = site1,site2
site_replication_factor = origin:2,site1:2,site2:2,total:4
site_search_factor = origin:1,site1:1,site2:1,total:2

As you can see I have configured the replication factor "origin:2,site1:2,site2:2,total:4" so that I will have 2 replicas in both sites. But, in the case of a site failure, I am observing that splunk will try to replicate locally in the site that is up and complete the 'total:4' condition. I think this can be a problem when the available disk space on the machines is less.

Let's say site2 indexer machines are at 80% disk space usage and site1 fails - now when splunk tries to create 4 replicas in the same site (site2) due to site failure, it can easily exhaust the disks.

As per update from splunk support, this is default behaviour, but I feel there needs to be additional control over this. Any advise or suggestions around this issue will be really helpful. Thank you.

Labels (1)
0 Karma

jpillai
Explorer

Update: The kind of failures we are usually expecting are network failures, where the failed site will be back in few hours. In the mean time we might not want 4 replicas in the same site that is up. Or in case we need additional replicas in any case, we want to do it manually

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!