Installation

New Search head unable to communicate with master

walsborn
Path Finder

I'm adding a new search head to my cluster and keep receiving the error: CMSearchHead - 'Unable to reach master' for master=https://xxx:8089.

I'm running version 7.3.4, search head cluster 3, trying to make it 4. 6 indexers, clustered.  I have a deployment server and search head deployment server. The cluster master acts as the master for all except forwarders. The nodes are sitting behind and F5 load balancer, all nodes in the same pool and are communicating.  The new search head can communicate with indexers, but not the master.  Though I can see where the search head has checked in and details are in the master. 

Initialization of the search head is good, pass4symmkeys have been replaced throughout the environment to all match. 

I have changed CMSearchHead logging to DEBUG and I get no other additional information about the error.

Any great ideas to try?

Labels (2)
0 Karma
1 Solution

isoutamo
SplunkTrust
SplunkTrust

The best practices is to use own separate pass4symkeys in every stanzas where it’s possible. 

Usually the format clustermaster:xyz is used when your SH is connected to several clusters. And in those cases there must be a separate stanza for that and then that stanza contains it’s pass4symkey.

in your environment this could be a separate app?

So what is output of 

splunk btool server list clustermaster:one --debug

View solution in original post

isoutamo
SplunkTrust
SplunkTrust

Hi

is this working: 

curl -kv https://xxxx:8089

If not then it's probably FW issue. If it works then something else.

Have you anything else on splunkd.log  (CM + that new SHC member)?

Have you added it already as SHC member to SHC?

r. Ismo

0 Karma

walsborn
Path Finder

Hello @isoutamo,

Thanks for the quick response.

curl -kv https://xxx:8089 was successful with all details.

On the master I can see a few things wrong but nothing associated to the new search head.

"WARN  DistributedPeerManager - Cannot determine a latest common bundle, search may be blocked

 ERROR DigestProcessor - Failed signature match

ERROR LMHttpUtil - Failed to verify HMAC signature, uri: /services/cluster/master/info/?output_mode=json

WARN  DistributedPeerManager - Cannot determine a latest common bundle, search may be blocked

ERROR DigestProcessor - Failed signature match"

 

I have added the search head to the cluster.  The initialization of the search head to the cluster went well. 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

can you share your server.conf from this new SHC member?

What show shcluster-status —verbose and show I store-status shows?

Are you using automatic replication of search peers on SHC or does you add those manually member by member?

r. Ismo

0 Karma

walsborn
Path Finder

Hi @isoutamo,

server.conf from new search head:

[sslConfig]

#sslKeysfilePassword = xxx

sslPassword = $xxx

sslVersions = *,-ssl2

sslVersionsForClient = *,-ssl2

cipherSuite = TLSv1+HIGH:TLSv1.2+HIGH:@STRENGTH

 

[lmpool:auto_generated_pool_download-trial]

description = auto_generated_pool_download-trial

quota = MAX

slaves = *

stack_id = download-trial

 

[lmpool:auto_generated_pool_forwarder]

description = auto_generated_pool_forwarder

quota = MAX

slaves = *

stack_id = forwarder

 

[lmpool:auto_generated_pool_free]

description = auto_generated_pool_free

quota = MAX

slaves = *

stack_id = free

 

[general]

serverName = new search head

pass4SymmKey = xxx

[replication_port://9996]

[shclustering]

conf_deploy_fetch_url = https://search head deployer:8089

disabled = false

mgmt_uri = https://splunksh-p1n01:8089

pass4SymmKey = xxx

replication_factor = 2

manual_detention = off

 

Splunk show cluster status --verbose

dynamic_captain : 1

              elected_captain : Thu Jul 30 22:22:30 2020

 

              initialized_flag : 1

                        label : splunksh-p1n03

                      mgmt_uri : https://splunksh-p1n03:8089

        min_peers_joined_flag : 1

          rolling_restart_flag : 0

            service_ready_flag : 1

 

Members:

splunksh-p1n03

                        label : splunksh-p1n03

                      mgmt_uri : https://splunksh-p1n03:8089

                mgmt_uri_alias : https://xx:8089

                        status : Up

splunksh-p1n02

                        label : splunksh-p1n02

        last_conf_replication : Mon Aug  3 07:08:21 2020

                      mgmt_uri : https://splunksh-p1n02:8089

                mgmt_uri_alias : https://xx:8089

                        status : Up

splunksh-p1n04.

                        label : splunksh-p1n04

        last_conf_replication : Mon Aug  3 07:08:21 2020

                      mgmt_uri : https://splunksh-p1n04.xx:8089

                mgmt_uri_alias : https://xx:8089

                        status : Up

splunksh-p1n01

                        label : splunksh-p1

        last_conf_replication : Mon Aug  3 07:08:21 2020

                      mgmt_uri : https://xx8089

                mgmt_uri_alias : https://xx:8089

                        status : Up

 

The replication factor is set at 2 and is automatic.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Did you have a separate app for clustering config or is it missing?

can you run

splunk btool server list clustering --debug
0 Karma

walsborn
Path Finder

An app is in place for search head clustering.

Btool Output:

/opt/splunk/etc/apps/cluster_search_base/default/server.conf [clustering]

/opt/splunk/etc/system/default/server.conf                       access_logging_for_heartbeats = false

/opt/splunk/etc/system/default/server.conf                       allow_default_empty_p4symmkey = true

/opt/splunk/etc/system/default/server.conf                       allowed_hbmiss_count = 3

/opt/splunk/etc/system/default/server.conf                       auto_rebalance_primaries = true

/opt/splunk/etc/system/default/server.conf                       available_sites =

/opt/splunk/etc/system/default/server.conf                       backup_and_restore_primaries_in_maintenance = false

/opt/splunk/etc/system/default/server.conf                       buckets_per_addpeer = 1000

/opt/splunk/etc/system/default/server.conf                       buckets_to_summarize = primaries

/opt/splunk/etc/system/default/server.conf                       commit_retry_time = 300

/opt/splunk/etc/system/default/server.conf                       constrain_singlesite_buckets = true

/opt/splunk/etc/system/default/server.conf                       cxn_timeout = 60

/opt/splunk/etc/system/default/server.conf                       decommission_force_finish_idle_time = 0

/opt/splunk/etc/system/default/server.conf                       decommission_node_force_timeout = 300

/opt/splunk/etc/system/default/server.conf                       decommission_search_jobs_wait_secs = 180

/opt/splunk/etc/system/default/server.conf                       deferred_cluster_status_update = true

/opt/splunk/etc/system/default/server.conf                       enableS2SHeartbeat = true

/opt/splunk/etc/system/default/server.conf                       executor_workers = 10

/opt/splunk/etc/system/default/server.conf                       generation_poll_interval = 5

/opt/splunk/etc/system/default/server.conf                       heartbeat_period = 1

/opt/splunk/etc/system/default/server.conf                       heartbeat_timeout = 60

/opt/splunk/etc/system/default/server.conf                       idle_connections_pool_size = -1

/opt/splunk/etc/system/default/server.conf                       local_executor_workers = 10

/opt/splunk/etc/system/default/server.conf                       maintenance_mode = false

/opt/splunk/etc/system/default/server.conf                       manual_detention = off

/opt/splunk/etc/apps/cluster_search_base/default/server.conf master_uri = clustermaster:one

/opt/splunk/etc/system/default/server.conf                       max_auto_service_interval = 30

/opt/splunk/etc/system/default/server.conf                       max_fixup_time_ms = 5000

/opt/splunk/etc/system/default/server.conf                       max_nonhot_rep_kBps = 0

/opt/splunk/etc/system/default/server.conf                       max_peer_build_load = 2

/opt/splunk/etc/system/default/server.conf                       max_peer_rep_load = 5

/opt/splunk/etc/system/default/server.conf                       max_peer_sum_rep_load = 5

/opt/splunk/etc/system/default/server.conf                       max_peers_to_download_bundle = 5

/opt/splunk/etc/system/default/server.conf                       max_primary_backups_per_service = 10

/opt/splunk/etc/system/default/server.conf                       max_replication_errors = 3

/opt/splunk/etc/apps/cluster_search_base/default/server.conf mode = searchhead

/opt/splunk/etc/system/default/server.conf                       multisite = false

/opt/splunk/etc/system/default/server.conf                       notify_scan_min_period = 10

/opt/splunk/etc/system/default/server.conf                       notify_scan_period = 10

/opt/splunk/etc/system/default/server.conf                       pass4SymmKey =

/opt/splunk/etc/system/default/server.conf                       percent_peers_to_restart = 10

/opt/splunk/etc/system/default/server.conf                       quiet_period = 60

/opt/splunk/etc/system/default/server.conf                       rcv_timeout = 60

/opt/splunk/etc/system/default/server.conf                       re_add_on_bucket_request_error = false

/opt/splunk/etc/system/default/server.conf                       rebalance_newgen_propagation_timeout = 60

/opt/splunk/etc/system/default/server.conf                       rebalance_pipeline_batch_size = 60

/opt/splunk/etc/system/default/server.conf                       rebalance_primaries_execution_limit_ms = 0

/opt/splunk/etc/system/default/server.conf                       rebalance_primary_failover_timeout = 75

/opt/splunk/etc/system/default/server.conf                       rebalance_search_completion_timeout = 180

/opt/splunk/etc/system/default/server.conf                       rebalance_threshold = 0.90

/opt/splunk/etc/system/default/server.conf                       register_forwarder_address =

/opt/splunk/etc/system/default/server.conf                       register_replication_address =

/opt/splunk/etc/system/default/server.conf                       register_search_address =

/opt/splunk/etc/system/default/server.conf                       rep_cxn_timeout = 60

/opt/splunk/etc/system/default/server.conf                       rep_max_rcv_timeout = 180

/opt/splunk/etc/system/default/server.conf                       rep_max_send_timeout = 180

/opt/splunk/etc/system/default/server.conf                       rep_rcv_timeout = 60

/opt/splunk/etc/system/default/server.conf                       rep_send_timeout = 60

/opt/splunk/etc/system/default/server.conf                       replication_factor = 3

/opt/splunk/etc/system/default/server.conf                       reporting_delay_period = 30

/opt/splunk/etc/system/default/server.conf                       restart_timeout = 60

/opt/splunk/etc/system/default/server.conf                       rolling_restart = restart

/opt/splunk/etc/system/default/server.conf                       s2sHeartbeatTimeout = 600

/opt/splunk/etc/system/default/server.conf                       search_factor = 2

/opt/splunk/etc/system/default/server.conf                       search_files_retry_timeout = 600

/opt/splunk/etc/system/default/server.conf                       searchable_rebalance = false

/opt/splunk/etc/system/default/server.conf                       searchable_target_sync_timeout = 60

/opt/splunk/etc/system/default/server.conf                       searchable_targets = true

/opt/splunk/etc/system/default/server.conf                       send_timeout = 60

/opt/splunk/etc/system/default/server.conf                       service_interval = 0

/opt/splunk/etc/system/default/server.conf                       service_jobs_msec = 100

/opt/splunk/etc/system/default/server.conf                       site_mappings =

/opt/splunk/etc/system/default/server.conf                       site_replication_factor = origin:2, total:3

/opt/splunk/etc/system/default/server.conf                       site_search_factor = origin:1, total:2

/opt/splunk/etc/system/default/server.conf                       summary_registration_batch_size = 1000

/opt/splunk/etc/system/default/server.conf                       summary_replication = false

/opt/splunk/etc/system/default/server.conf                       summary_update_batch_size = 10

/opt/splunk/etc/system/default/server.conf                       summary_wait_time = 660

/opt/splunk/etc/system/default/server.conf                       target_wait_time = 150

/opt/splunk/etc/system/default/server.conf                       throwOnBucketBuildReadError = false

/opt/splunk/etc/system/default/server.conf                       use_batch_mask_changes = true

/opt/splunk/etc/system/default/server.conf                       warm_bucket_replication_pre_upload = false

0 Karma

isoutamo
SplunkTrust
SplunkTrust

 This

 /opt/splunk/etc/apps/cluster_search_base/default/server.conf master_uri = clustermaster:one
should be

master_uri = https://<your cm w fqdn>:<mgmt port> 

 

0 Karma

walsborn
Path Finder

I brought up the captain and can see the  /opt/splunk/etc/apps/cluster_search_base/default/server.conf master_uri = clustermaster:one matches on the new search head.

But there is a /opt/splunk/etc/apps/cluster_search_base/local/server.conf that contains a [clustering] stanzay that includes a pass4SymmKey that I was not tracking.

Since this Pass4Symm is in an app is it going to work the same?  I input key in plain text and reboot?

0 Karma

isoutamo
SplunkTrust
SplunkTrust

The best practices is to use own separate pass4symkeys in every stanzas where it’s possible. 

Usually the format clustermaster:xyz is used when your SH is connected to several clusters. And in those cases there must be a separate stanza for that and then that stanza contains it’s pass4symkey.

in your environment this could be a separate app?

So what is output of 

splunk btool server list clustermaster:one --debug

View solution in original post

walsborn
Path Finder

Hi @isoutamo , 

Thank you for all the work!  Turns out that app contained the pass4symmkey that was tied to the cluster master.  On the master the key is in /opt/splunk/etc/system/local/server.conf under clustering stanza.

 

Thanks again! I'm going to accept the previous reply.

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!