Question about fix-up tasks and their scenario is indexer goes down so the CM starts to do the fix-up. In the event the indexer returns to service BEFORE the fix-up tasks are completed, does the CM cancel the fix-up tasks or complete them and you just have excess buckets?
Concern is bout the network throughput needed during recovery.
The CM will schedule at most 5 concurrent replication fixups and 3 concurrent searchable fixups per Indexer. Those that are already scheduled (5/3) won’t be canceled. However, all the remaining buckets that weren’t “scheduled” wont lead to excess buckets since the jobs wont be scheduled after the indexer recovers.
time1 - indexer A goes down with 1000 buckets.
time2 - CM starts scheduling jobs to fixup RF/SF. CM will at most schedule up to 5 RF/3 SF jobs per indexer. As these jobs complete, more will be scheduled.
time3 - indexer A comes back up.
if during time2, we scheduled and fixed 50 jobs, there’ll be 50 excess RF/SF copies. the rest of the 950+ that werent fixed wont have any excess…
f network bandwidth is a concern, there is a new 7.2 setting that CAPS how much bandwidth each indexer uses for “fixup” operations.
server.conf
max_nonhot_rep_kBps = <integer>
* This is the maximum throughput (kB(Bytes)/s) for warm/cold/summary
* replications on a specific source peer. Similar to forwarder's maxKBps
* setting in the limits.conf file.
* This setting throttles total bandwidth consumption for all
outgoing non-hot replication connections from a given source peer.
It does not throttle at the 'per-replication-connection', per-target
level.
* This setting is reloadable without restart if manually updated on the
source peers by using the command "splunk edit cluster-config"
or by making the corresponding REST call. We don't recommend updating
this setting across all the peers using bundle push because:
1) The push requires a rolling restart, as do all bundle pushes
with the server.conf file change.
2) You might want to set different values on different peers.
* If set to 0, signifies unlimited throughput.
* Default: 0
The CM will schedule at most 5 concurrent replication fixups and 3 concurrent searchable fixups per Indexer. Those that are already scheduled (5/3) won’t be canceled. However, all the remaining buckets that weren’t “scheduled” wont lead to excess buckets since the jobs wont be scheduled after the indexer recovers.
time1 - indexer A goes down with 1000 buckets.
time2 - CM starts scheduling jobs to fixup RF/SF. CM will at most schedule up to 5 RF/3 SF jobs per indexer. As these jobs complete, more will be scheduled.
time3 - indexer A comes back up.
if during time2, we scheduled and fixed 50 jobs, there’ll be 50 excess RF/SF copies. the rest of the 950+ that werent fixed wont have any excess…