Installation

We are building a new indexer cluster and getting search and replication factor errors

sathwik067
Explorer

Hello all,

We are trying to build new indexer cluster with new cluster master. We installed splunk on all the servers and integrated indexers with the cluster master. After all the process we are getting search and replication factor errors with below warning messages. We check all the ports connectivity between all the indexers and the cluster master everything is connected but we still getting this warning. We tried cleaning up the eventdata as suggested in one of the posts but that did not work either. Please let me know if anyone faced this type of issue and resolved it that would be very helpful. Let me know if you need any more info.

We have search and replication factor = 2 with three indexers

Search peer abcd.com has the following message: Too many bucket replication errors to target peer=xx.xx.xx.xx:8080. Will stop streaming data from hot buckets to this target while errors persist. Check for network connectivity from the cluster peer reporting this issue to the replication port of target peer. If this condition persists, you can temporarily put that peer in manual detention.

Thanks.

Labels (2)
0 Karma
1 Solution

sathwik067
Explorer

Hello all,

 

The problem is the MTU setting on the 1 Gb bonded network interface is set to 9,000 on our new indexes.  We changed it to 1500 and that fixed the search and replication factor. 

 

Thanks.

View solution in original post

sathwik067
Explorer

Hello all,

 

The problem is the MTU setting on the 1 Gb bonded network interface is set to 9,000 on our new indexes.  We changed it to 1500 and that fixed the search and replication factor. 

 

Thanks.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

Have you checked the connectivity among the individual indexers?  Replication is direct from indexer to indexer - not via the CM - so it's critical for an indexer to be able to connect to all other indexers and not just the CM.

---
If this reply helps you, an upvote would be appreciated.

sathwik067
Explorer

Hello,

Thanks for the response. Yes, we have checked the connectivity between the indexers as well and the ports are connected between the indexers.

0 Karma

soutamo
SplunkTrust
SplunkTrust
What error messages you found from splunkd.log on all those servers?
0 Karma

sathwik067
Explorer

Hello,

thanks for the response. below are some of the errors we are seeing on the indexers

"ERROR TcpInputProc - Error encountered for connection from src=xx.xx.xx.xx:xxxx. Read Timeout Timed out after 600 seconds."
"BucketReplicator - Failed to replicate warm bucket bid=_internal~xx to guid=ABCD host=xx.xx.xx.xx s2sport=8080. Read timed out after 180 secs."
 
Missing enough suitable candidates to create searchable copy in order to meet replication policy. Missing={ default:1 
Waiting 'target_wait_time' before search factor fixup
Cannot fix search count as the bucket hasn't rolled yet. 

 

Search peer abcd.com has the following message: Too many bucket replication errors to target peer=xx.xx.xx.xx:8080. Will stop streaming data from hot buckets to this target while errors persist. Check for network connectivity from the cluster peer reporting this issue to the replication port of target peer. If this condition persists, you can temporarily put that peer in manual detention

0 Karma

soutamo
SplunkTrust
SplunkTrust
What you get if you are trying from one peer to another
curl -v telnet://<peer name/ip>:8080
0 Karma

sathwik067
Explorer

It is getting connected 
curl -v telnet://xx.xx.xx.xx:8080
* About to connect() to xx.xx.xx.xx port 8080 (#0)
* Trying xx.xx.xx.xx...
* Connected to xx.xx.xx.xx (xx.xx.xx.xx) port 8080 (#0)

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.