Monitoring Splunk

Error: slave has more than 10 number of continous replication failures. Look at splunkd.log for details on why this is happening.

Hello,

I have created a distributed search environment. On my masternode im getting the error: replication factor and search factor not reached, because of the following errors on my indexers: "Error: slave has more than 10 number of continous replication failures. Look at splunkd.log for details on why this is happening."

So, does everyone know how to solve this problem? I have checked the connectivity between the indexers, and they are available one to each other. What else could it be? Any ideas?

Best regards.

Labels (2)
0 Karma

06.12.19 16:03:19,216

12-06-2019 16:03:19.216 +0100 INFO CMRepJob - job=CMReplicationErrorJob bid=_audit~91~42C2AF4B-A702-44F4-B450-1D0156387DE8 failingGuid=4CAFFC9C-BD32-437D-B603-DFA3BD15DC50 srcGuid=42C2AF4B-A702-44F4-B450-1D0156387DE8 tgtGuid=4CAFFC9C-BD32-437D-B603-DFA3BD15DC50 succeeded

host = indexer1
source = /opt/splunk/var/log/splunk/splunkd.log
sourcetype = splunkd

06.12.19 16:03:19,214   

12-06-2019 16:03:19.214 +0100 INFO CMSlave - bid=_audit~91~42C2AF4B-A702-44F4-B450-1D0156387DE8 src=42C2AF4B-A702-44F4-B450-1D0156387DE8 tgt=4CAFFC9C-BD32-437D-B603-DFA3BD15DC50 failing=4CAFFC9C-BD32-437D-B603-DFA3BD15DC50 queued replication error job

host = indexer1
source = /opt/splunk/var/log/splunk/splunkd.log
sourcetype = splunkd

06.12.19 16:03:19,214   

12-06-2019 16:03:19.214 +0100 INFO CMReplicationRegistry - Finished replication: bid=_audit~91~42C2AF4B-A702-44F4-B450-1D0156387DE8 src=42C2AF4B-A702-44F4-B450-1D0156387DE8 target=4CAFFC9C-BD32-437D-B603-DFA3BD15DC50

host = indexer1
source = /opt/splunk/var/log/splunk/splunkd.log
sourcetype = splunkd

06.12.19 16:03:19,214   

12-06-2019 16:03:19.214 +0100 WARN BucketReplicator - Failed to replicate warm bucket bid=_audit~91~42C2AF4B-A702-44F4-B450-1D0156387DE8 to guid=4CAFFC9C-BD32-437D-B603-DFA3BD15DC50 host=172.28.128.18 s2sport=9887. Connection failed

host = indexer1
source = /opt/splunk/var/log/splunk/splunkd.log
sourcetype = splunkd

06.12.19 16:03:19,214   

12-06-2019 16:03:19.214 +0100 INFO BucketReplicator - Discarding replication data as QueueRef=guid=4CAFFC9C-BD32-437D-B603-DFA3BD15DC50 host=172.28.128.18 s2sport=9887 bid=_audit~91~42C2AF4B-A702-44F4-B450-1D0156387DE8 is deleted

host = indexer1
source = /opt/splunk/var/log/splunk/splunkd.log
sourcetype = splunkd

06.12.19 16:03:19,214   

12-06-2019 16:03:19.214 +0100 WARN BucketReplicator - Connection failed

host = indexer1
source = /opt/splunk/var/log/splunk/splunkd.log
sourcetype = splunkd

06.12.19 16:03:19,214   

12-06-2019 16:03:19.214 +0100 ERROR TcpOutputFd - Connection to host=172.28.128.18:9887 failed

host = indexer1
source = /opt/splunk/var/log/splunk/splunkd.log
sourcetype = splunkd

06.12.19 16:03:19,214   

12-06-2019 16:03:19.214 +0100 WARN TcpOutputFd - Connect to 172.28.128.18:9887 failed. No route to host

host = indexer1
source = /opt/splunk/var/log/splunk/splunkd.log
sourcetype = splunkd

06.12.19 16:03:19,214   

12-06-2019 16:03:19.214 +0100 INFO BucketReplicator - event=localReplicationFinished type=warm bid=_audit~91~42C2AF4B-A702-44F4-B450-1D0156387DE8

host = indexer1
source = /opt/splunk/var/log/splunk/splunkd.log
sourcetype = splunkd

06.12.19 16:03:19,214   

12-06-2019 16:03:19.214 +0100 INFO BucketReplicator - event=finishBucketReplication bid=_audit~91~42C2AF4B-A702-44F4-B450-1D0156387DE8 [et=1575639605 lt=1575639665 type=2]

host = indexer1
source = /opt/splunk/var/log/splunk/splunkd.log
sourcetype = splunkd

06.12.19 16:03:19,214   

12-06-2019 16:03:19.214 +0100 INFO BucketReplicator - Replicating warm bucket bid=audit~91~42C2AF4B-A702-44F4-B450-1D0156387DE8 node=guid=4CAFFC9C-BD32-437D-B603-DFA3BD15DC50 host=172.28.128.18 s2sport=9887 bid=audit~91~42C2AF4B-A702-44F4-B450-1D0156387DE8

host = indexer1
source = /opt/splunk/var/log/splunk/splunkd.log
sourcetype = splunkd

06.12.19 16:03:19,214   

12-06-2019 16:03:19.214 +0100 INFO BucketReplicator - Starting replication of bucket bid=_audit~91~42C2AF4B-A702-44F4-B450-1D0156387DE8 to 172.28.128.18:9887;

host = indexer1
source = /opt/splunk/var/log/splunk/splunkd.log
sourcetype = splunkd

06.12.19 16:03:19,214   

12-06-2019 16:03:19.214 +0100 INFO BucketReplicator - event=startBucketReplication bucket bid=_audit~91~42C2AF4B-A702-44F4-B450-1D0156387DE8

host = indexer1
source = /opt/splunk/var/log/splunk/splunkd.log
sourcetype = splunkd
0 Karma

Splunk Employee
Splunk Employee

Can you provide the splunkd logs

0 Karma

I have found the solution. There is a replication port for the communication between the indexers in a cluster. I have forgot to set a firewall rule for this port. After I set a rule for this port it worked.

0 Karma

But anyway, thank you for your interest.

0 Karma