As you can see I have configured the replication factor "origin:2,site1:2,site2:2,total:4" so that I will have 2 replicas in both sites. But, in the case of a site failure, I am observing that splunk will try to replicate locally in the site that is up and complete the 'total:4' condition. I think this can be a problem when the available disk space on the machines is less.
Let's say site2 indexer machines are at 80% disk space usage and site1 fails - now when splunk tries to create 4 replicas in the same site (site2) due to site failure, it can easily exhaust the disks.
As per update from splunk support, this is default behaviour, but I feel there needs to be additional control over this. Any advise or suggestions around this issue will be really helpful. Thank you.
Update: The kind of failures we are usually expecting are network failures, where the failed site will be back in few hours. In the mean time we might not want 4 replicas in the same site that is up. Or in case we need additional replicas in any case, we want to do it manually