Hi,
I'm getting the below error and the hot buckets are not replicated across the indexers in a cluster.
08-23-2019 04:20:05.197 +0000 WARN BucketReplicator - Failed to replicate Streaming bucket bid=main~337~BB922979-04B2-49E0-AE94-B75588520776 to guid=EFEA22B6-1EAC-4505-8B01-DD2A57666CD7 host=splunk-indexer-b.xxx.svc.cluster.local s2sport=9887. Connection failed
When I do the telnet from indexer-a to b, the connection closed immediately, However if do the telnet from the localhost the connection is established
root@splunk-indexer-a:~# telnet splunk-indexer-b.ns.svc.cluster.local 9887
Trying 172.20.208.244...
Connected to splunk-indexer-b.ns.svc.cluster.local.
Escape character is '^]'.
Connection closed by foreign host.
Thanks.
Could you please share some details on how exactly you fixed network connectivity issues to solve WARN BucketReplicator - Failed to replicate Streaming bucket?
We have exactly the same issue.
Your help will be much appreciated!
 
					
				
		
 
		
		
		
		
		
	
			
		
		
			
					
		You have connectivity problems between your indexers. Check your firewalls.
Hi Richgalloway,
yup, I've fixed the connectivity issue, and now I'm having different issue.
The search factor(2) and the replication factor(2) is not met always, I have 3 indexers in my cluster and all the connections are looks good. One thing that I noticed was, there is always some fixup tasks running on the bucket status ui, and the Fixup reason showing as "streaming failure - src=B621E78A-369C-42CC-B604-000174F54036 tgt=BB922979-04B2-49E0-AE94-B75588520776 failing=src"
After some time the fixup tasks completed succesfully, but it appears again for new buckets and so on. Because of this behavior I've always see "Data Durability" warning on the master. I've recently activated the smartstore in our cluster and I'm not sure that's causing this issue.
Thanks.
Thanks.
 
		
		
		
		
		
	
			
		
		
			
					
		Please share your server.conf and inputs.conf.
Hi,
Please find the required config files. Also i don't have any customized inputs.conf and using the defaults only.
Adding few more points, I have 3 indexers cluster, and today I reduced rep/search factor to 2 and its keeping the data durability green longer time than before(3sf/3rf). One of the peer is always lags behind other two in order to catch up with hot buckets.
Is there a way that we can set wider time window to sync the hot buckets?
[clustering]
cluster_label = idxc_label
mode = master
pass4SymmKey = 
search_factor = 3
replication_factor = 3
rebalance_threshold = 0.9
percent_peers_to_restart = 100
[clustering]
master_uri = https://splunk-master.splunk.svc.cluster.local:8089
mode = slave
pass4SymmKey = 
register_replication_address = splunk-indexer-1.splunk.svc.cluster.local
max_replication_errors = 50 
maxGlobalDataSizeMB = 50000
[default]
remotePath = volume:remote_store/$_index_name
repFactor = 0 
[main]
repFactor = auto
[_introspection]
repFactor = 0
[_audit]
repFactor = 0 
[_internal]
repFactor = 0 
[_telemetry]
repFactor = 0
[volume:remote_store]
storageType = remote
path = s3://splunk-smart-store-bucket
hi, I had same problem . I changed two things of server.conf file of all indexers :
from,
master_uri = https://splunk-master.splunk.svc.cluster.local:8089
mode = slave
to,
manager_uri = https://splunk-master.splunk.svc.cluster.local:8089
mode = peer
and restart the splunk. these changes worked for me.
@sankareds Could you please share some details on how exactly you fixed network connectivity issues to solve WARN BucketReplicator - Failed to replicate Streaming bucket?
We have exactly the same issue.
Your help will be much appreciated!
If I remember correctly, I've added the below ansible task to register the replication address. This way indexers can trust each other replication requests.
---
- include_tasks: ../../../roles/splunk_common/tasks/wait_for_splunk_instance.yml
  vars:
    splunk_instance_address: "{{ cluster_master_host }}"
- name: Set current node as indexer cluster peer
  command: "{{ splunk.exec }} edit cluster-config -register_replication_address $HOSTNAME.splunk.svc.cluster.local -mode slave -master_uri '{{ cert_prefix }}://{{ cluster_master_host }}:{{ splunk.svc_port }}' -replication_port {{ splunk.idxc.replication_port }} -secret '{{ splunk.idxc.secret }}' -auth '{{ splunk.admin_user }}:{{ splunk.password }}'"
  become: yes
  become_user: "{{ splunk.user }}"
  register: task_result
  changed_when: task_result.rc == 0
  until: task_result.rc == 0
  retries: "{{ retry_num }}"
  delay: 3
  ignore_errors: yes
  notify:
    - Restart the splunkd service
  no_log: "{{ hide_password }}"
As per the Splunk documentation:
register_replication_address = <IP address or fully qualified machine/domain name> * Only valid for 'mode=peer'. * This is the address on which a peer is available for accepting replication data. This is useful in the cases where a peer host machine has multiple interfaces and only one of them can be reached by another splunkd instance
If you are not using splunk-ansible, you could set this value directly in the server.conf. 
https://docs.splunk.com/Documentation/Splunk/8.1.1/Admin/Serverconf
