Solved: [IndexerCluster] Impact on FIx-up task if the dow...

rbal_splunk · ‎11-27-2018

Question about fix-up tasks and their scenario is indexer goes down so the CM starts to do the fix-up. In the event the indexer returns to service BEFORE the fix-up tasks are completed, does the CM cancel the fix-up tasks or complete them and you just have excess buckets?

Concern is bout the network throughput needed during recovery.

rbal_splunk · ‎11-27-2018

The CM will schedule at most 5 concurrent replication fixups and 3 concurrent searchable fixups per Indexer. Those that are already scheduled (5/3) won’t be canceled. However, all the remaining buckets that weren’t “scheduled” wont lead to excess buckets since the jobs wont be scheduled after the indexer recovers.

time1 - indexer A goes down with 1000 buckets.
time2 - CM starts scheduling jobs to fixup RF/SF. CM will at most schedule up to 5 RF/3 SF jobs per indexer. As these jobs complete, more will be scheduled.
time3 - indexer A comes back up.

if during time2, we scheduled and fixed 50 jobs, there’ll be 50 excess RF/SF copies. the rest of the 950+ that werent fixed wont have any excess…

View solution in original post

rbal_splunk · ‎11-27-2018

f network bandwidth is a concern, there is a new 7.2 setting that CAPS how much bandwidth each indexer uses for “fixup” operations.

server.conf

max_nonhot_rep_kBps = <integer>
* This is the maximum throughput (kB(Bytes)/s) for warm/cold/summary 
* replications on a specific source peer. Similar to forwarder's maxKBps 
* setting in the limits.conf file.
* This setting throttles total bandwidth consumption for all 
  outgoing non-hot replication connections from a given source peer. 
  It does not throttle at the 'per-replication-connection', per-target 
  level.
* This setting is reloadable without restart if manually updated on the 
  source peers by using the command "splunk edit cluster-config" 
  or by making the corresponding REST call. We don't recommend updating 
  this setting across all the peers using bundle push because: 
    1) The push requires a rolling restart, as do all bundle pushes 
       with the server.conf file change.
    2) You might want to set different values on different peers.
* If set to 0, signifies unlimited throughput.
* Default: 0

rbal_splunk · ‎11-27-2018

The CM will schedule at most 5 concurrent replication fixups and 3 concurrent searchable fixups per Indexer. Those that are already scheduled (5/3) won’t be canceled. However, all the remaining buckets that weren’t “scheduled” wont lead to excess buckets since the jobs wont be scheduled after the indexer recovers.

time1 - indexer A goes down with 1000 buckets.
time2 - CM starts scheduling jobs to fixup RF/SF. CM will at most schedule up to 5 RF/3 SF jobs per indexer. As these jobs complete, more will be scheduled.
time3 - indexer A comes back up.

if during time2, we scheduled and fixed 50 jobs, there’ll be 50 excess RF/SF copies. the rest of the 950+ that werent fixed wont have any excess…

[IndexerCluster] Impact on FIx-up task if the down indexer recoves during fix-up task recovery

Fastest way to demo Observability

September Community Champions: A Shoutout to Our Contributors!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps

Are you a member of the Splunk Community?

[IndexerCluster] Impact on FIx-up task if the down indexer recoves during fix-up task recovery

Fastest way to demo Observability

September Community Champions: A Shoutout to Our Contributors!

Splunk Decoded: Service Maps vs Service Analyzer Tree View vs Flow Maps