Getting Data In

Failed to replicate Streaming bucket

sankareds
Explorer

Hi,

I'm getting the below error and the hot buckets are not replicated across the indexers in a cluster.

08-23-2019 04:20:05.197 +0000 WARN BucketReplicator - Failed to replicate Streaming bucket bid=main~337~BB922979-04B2-49E0-AE94-B75588520776 to guid=EFEA22B6-1EAC-4505-8B01-DD2A57666CD7 host=splunk-indexer-b.xxx.svc.cluster.local s2sport=9887. Connection failed

When I do the telnet from indexer-a to b, the connection closed immediately, However if do the telnet from the localhost the connection is established

root@splunk-indexer-a:~# telnet splunk-indexer-b.ns.svc.cluster.local 9887
Trying 172.20.208.244...
Connected to splunk-indexer-b.ns.svc.cluster.local.
Escape character is '^]'.
Connection closed by foreign host.

Thanks.

justynap_ldz
Path Finder

Could you please share some details on how exactly you fixed network connectivity issues to solve WARN BucketReplicator - Failed to replicate Streaming bucket?
We have exactly the same issue.

Your help will be much appreciated!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

You have connectivity problems between your indexers. Check your firewalls.

---
If this reply helps you, Karma would be appreciated.
0 Karma

sankareds
Explorer

Hi Richgalloway,

yup, I've fixed the connectivity issue, and now I'm having different issue.

The search factor(2) and the replication factor(2) is not met always, I have 3 indexers in my cluster and all the connections are looks good. One thing that I noticed was, there is always some fixup tasks running on the bucket status ui, and the Fixup reason showing as "streaming failure - src=B621E78A-369C-42CC-B604-000174F54036 tgt=BB922979-04B2-49E0-AE94-B75588520776 failing=src"

After some time the fixup tasks completed succesfully, but it appears again for new buckets and so on. Because of this behavior I've always see "Data Durability" warning on the master. I've recently activated the smartstore in our cluster and I'm not sure that's causing this issue.

Thanks.

Thanks.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Please share your server.conf and inputs.conf.

0 Karma

sankareds
Explorer

Hi,

Please find the required config files. Also i don't have any customized inputs.conf and using the defaults only.

Adding few more points, I have 3 indexers cluster, and today I reduced rep/search factor to 2 and its keeping the data durability green longer time than before(3sf/3rf). One of the peer is always lags behind other two in order to catch up with hot buckets.

Is there a way that we can set wider time window to sync the hot buckets?

cluster master [server.conf]

[clustering]
cluster_label = idxc_label
mode = master
pass4SymmKey =
search_factor = 3
replication_factor = 3
rebalance_threshold = 0.9
percent_peers_to_restart = 100

indexers [server.conf]

[clustering]
master_uri = https://splunk-master.splunk.svc.cluster.local:8089
mode = slave
pass4SymmKey =
register_replication_address = splunk-indexer-1.splunk.svc.cluster.local
max_replication_errors = 50

indexers [indexes.conf]

maxGlobalDataSizeMB = 50000

[default]
remotePath = volume:remote_store/$_index_name
repFactor = 0

only replicating main index

[main]
repFactor = auto

[_introspection]
repFactor = 0

[_audit]
repFactor = 0

[_internal]
repFactor = 0

[_telemetry]
repFactor = 0

[volume:remote_store]
storageType = remote
path = s3://splunk-smart-store-bucket

0 Karma

MaroofaTanweer
New Member

hi, I had same problem . I changed two things of server.conf file of all indexers :

 from,

master_uri = https://splunk-master.splunk.svc.cluster.local:8089
mode = slave

to,

manager_uri = https://splunk-master.splunk.svc.cluster.local:8089
mode = peer

and restart the splunk. these changes worked for me.

0 Karma

justynap_ldz
Path Finder

@sankareds Could you please share some details on how exactly you fixed network connectivity issues to solve WARN BucketReplicator - Failed to replicate Streaming bucket?
We have exactly the same issue.

Your help will be much appreciated!

0 Karma

sankareds
Explorer

Hi @justynap_ldz 

If I remember correctly,  I've added the below ansible task to register the replication address. This way indexers can trust each other replication requests.

 

 

---
- include_tasks: ../../../roles/splunk_common/tasks/wait_for_splunk_instance.yml
  vars:
    splunk_instance_address: "{{ cluster_master_host }}"

- name: Set current node as indexer cluster peer
  command: "{{ splunk.exec }} edit cluster-config -register_replication_address $HOSTNAME.splunk.svc.cluster.local -mode slave -master_uri '{{ cert_prefix }}://{{ cluster_master_host }}:{{ splunk.svc_port }}' -replication_port {{ splunk.idxc.replication_port }} -secret '{{ splunk.idxc.secret }}' -auth '{{ splunk.admin_user }}:{{ splunk.password }}'"
  become: yes
  become_user: "{{ splunk.user }}"
  register: task_result
  changed_when: task_result.rc == 0
  until: task_result.rc == 0
  retries: "{{ retry_num }}"
  delay: 3
  ignore_errors: yes
  notify:
    - Restart the splunkd service
  no_log: "{{ hide_password }}"

 

 


As per the Splunk documentation:

register_replication_address = <IP address or fully qualified machine/domain name>
* Only valid for 'mode=peer'.
* This is the address on which a peer is available for accepting
  replication data. This is useful in the cases where a peer host machine
  has multiple interfaces and only one of them can be reached by another
  splunkd instance


If you are not using splunk-ansible, you could set this value directly in the server.conf. 

https://docs.splunk.com/Documentation/Splunk/8.1.1/Admin/Serverconf

0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...