Solved: error in clustering and replication

satyaallaparthi · ‎05-12-2019

Hello,

I am getting the following error in my deployment server or Cluster master. Eventhough ouputs.conf is correct.

outputs.conf :

[tcpout]
defaultGroup = indexers
[tcpout:indexers]
server = IDX1:9997, IDX2:9997
autoLB = true
forceTimebasedAutoLB = true
autoLBFrequency = 40

Please help me with the error. Any help would be appreciated. Thanks!

TCPOutAutoLB-0
Root Cause(s):
More than 20% of forwarding destinations have failed. Ensure your hosts and ports in outputs.conf are correct. Also ensure that the indexers are all running, and that any SSL certificates being used for forwarding are correct.
Last 50 related messages:
05-12-2019 19:30:02.091 -0400 WARN TcpOutputProc - Cooked connection to ip=10.184.132.110:9997 timed out
05-12-2019 19:26:33.000 -0400 WARN TcpOutputProc - Cooked connection to ip=10.184.132.110:9997 timed out
05-12-2019 19:25:33.180 -0400 WARN TcpOutputProc - Cooked connection to ip=10.184.132.110:9997 timed out
05-12-2019 19:23:03.692 -0400 WARN TcpOutputProc - Cooked connection to ip=10.184.132.110:9997 timed out

DavidHourani · ‎05-13-2019

Hi @satyaallaparthi,

Seems like your configuration's working but the receiving indexers are either not listening on port 9997 or a firewall is blocking the traffic in between.

Have a look there and see. Easy way to debug that is to run telnet 10.184.132.110 9997 or nc 10.184.132.110 99997 from your forwarder to see if it times out or not.

Cheers,
David

View solution in original post

oscar84x · ‎05-14-2019

Do you have any errors in splunkd.log?

DavidHourani · ‎05-13-2019

Hi @satyaallaparthi,

Seems like your configuration's working but the receiving indexers are either not listening on port 9997 or a firewall is blocking the traffic in between.

Have a look there and see. Easy way to debug that is to run telnet 10.184.132.110 9997 or nc 10.184.132.110 99997 from your forwarder to see if it times out or not.

Cheers,
David

satyaallaparthi · ‎05-13-2019

Hello @DavidHourani,

I did tried telnet before I posted. Everything seems to be fine and communicating to each other. But still getting the problem.. :'(

Thanks,

koshyk · ‎05-13-2019

Before outputs.conf , you need to verify if the SSL/TLS connectivity is good with Indexer Slaves & CLM
I've made quite a new templates for clustering and can be found here (mainly I use for docker)

So sample in CLM should look like

[general]
site= site0

#indexer clustering
[clustering]
mode = master 
pass4SymmKey = my_pass
cluster_label = my_idx_cluster1
multisite = true
replication_factor = 2
search_factor = 2
site_replication_factor = origin:1, total:2
site_search_factor = origin:1, total:2
available_sites = site1, site2

[indexer_discovery]
pass4SymmKey = my_pass
polling_rate = 300

and Indexer Slaves like

#indexer clustering
[clustering]
mode = slave
master_uri=https://CLUSTER_MASTER_URI:8089
pass4SymmKey = my_pass
cluster_label = my_idx_cluster1

[replication_port://{{replicationPort}}]
disabled = false
rootCA = $SPLUNK_HOME/etc/apps/MY_CERT_APP/bin/auth/rootCA.pem
serverCert = $SPLUNK_HOME/etc/apps/MY_CERT_APP/bin/auth/device.pem
password = my_pass
requireClientCert = false

satyaallaparthi · ‎05-13-2019

Hello @koshyk,

Everything looks exactly same in my server.conf and I did telnet from my CM to slaves and everything is communicating properly but still getting the above error in CM.

error in clustering and replication

Modern way of developing distributed application using OTel

Enterprise Security Content Update (ESCU) | New Releases

Archived Metrics Now Available for APAC and EMEA realms