Scenario:
multi-site cluster
site1 and site2
site_rep_factor=origin:2, total:3
site_search_factor=origin:2, total:3
bucket12345 has 2 copies in site1 (origin) and 1 copy in site2.
When a copy of the bucket is deleted in the origin site1 (the rb_* copy), the CM kicks off a job to make a new copy of that bucket. I see it being copied from an indexer in site2, instead of an indexer in site1. I expected Splunk to use a copy in the same site as the source, but it's not doing that.
Why?
The logic behind bucket replication sourcing works like this:
1) We will prefer a local site source for RF replication.
2) However, if the local sources is already at max capacity for how many replications it can be involved in (max_peer_rep_load), then we can definitely go cross site for RF replication.
3) For SF replication, there is no preferences, it ends up being random.
On the CM in server.conf: [clustering] max_peer_rep_load can be used to throttle up/down how many replication jobs are happening at once. Lowering this will slow down non-streaming (warm/cold) bucket replication, but will not affect streaming (hot) bucket replication. This value represents "slots" for each indexer to participate in non-streaming replication, either as a source or as a target.
Huh, what? Need an example.
Imagine 3 peers on site1, with bucket A and B that we want to be replicated intrasite (site1 needs to have 2 copies of A and B buckets), and max_peer_rep_load=1 (for simplified example), and 1 peer on site2:
Site1:
Peer1 - Bucket A
Peer2 - Bucket B
Peer3 - Bucket C
Site2:
Peer4 - Bucket B
We may trigger replication of Bucket A on Peer1->Peer2. Since Peer1 & 2 are involved in a replication, both of the "peer rep" slots are now taken on Peer1 and Peer2.
Peer3 has a slot available, so it can get a replication of BucketB from some outside site (Peer4 in site2) since Peer2 doesn't have a slot available, thus triggering an inter-site copy.
Unfortunately, when we fix buckets, we fix them in some fixed (but random) order, and if the bucket we're scheduling next for replication doesn't have a Source on the local site, it will go to an alternate site.
A huge thank you to @dxu_splunk for the background to answer the question.
-dave
The logic behind bucket replication sourcing works like this:
1) We will prefer a local site source for RF replication.
2) However, if the local sources is already at max capacity for how many replications it can be involved in (max_peer_rep_load), then we can definitely go cross site for RF replication.
3) For SF replication, there is no preferences, it ends up being random.
On the CM in server.conf: [clustering] max_peer_rep_load can be used to throttle up/down how many replication jobs are happening at once. Lowering this will slow down non-streaming (warm/cold) bucket replication, but will not affect streaming (hot) bucket replication. This value represents "slots" for each indexer to participate in non-streaming replication, either as a source or as a target.
Huh, what? Need an example.
Imagine 3 peers on site1, with bucket A and B that we want to be replicated intrasite (site1 needs to have 2 copies of A and B buckets), and max_peer_rep_load=1 (for simplified example), and 1 peer on site2:
Site1:
Peer1 - Bucket A
Peer2 - Bucket B
Peer3 - Bucket C
Site2:
Peer4 - Bucket B
We may trigger replication of Bucket A on Peer1->Peer2. Since Peer1 & 2 are involved in a replication, both of the "peer rep" slots are now taken on Peer1 and Peer2.
Peer3 has a slot available, so it can get a replication of BucketB from some outside site (Peer4 in site2) since Peer2 doesn't have a slot available, thus triggering an inter-site copy.
Unfortunately, when we fix buckets, we fix them in some fixed (but random) order, and if the bucket we're scheduling next for replication doesn't have a Source on the local site, it will go to an alternate site.
A huge thank you to @dxu_splunk for the background to answer the question.
-dave