I'm adding a new search head to my cluster and keep receiving the error: CMSearchHead - 'Unable to reach master' for master=https://xxx:8089.
I'm running version 7.3.4, search head cluster 3, trying to make it 4. 6 indexers, clustered. I have a deployment server and search head deployment server. The cluster master acts as the master for all except forwarders. The nodes are sitting behind and F5 load balancer, all nodes in the same pool and are communicating. The new search head can communicate with indexers, but not the master. Though I can see where the search head has checked in and details are in the master.
Initialization of the search head is good, pass4symmkeys have been replaced throughout the environment to all match.
I have changed CMSearchHead logging to DEBUG and I get no other additional information about the error.
Any great ideas to try?
The best practices is to use own separate pass4symkeys in every stanzas where it’s possible.
Usually the format clustermaster:xyz is used when your SH is connected to several clusters. And in those cases there must be a separate stanza for that and then that stanza contains it’s pass4symkey.
in your environment this could be a separate app?
So what is output of
splunk btool server list clustermaster:one --debug
Hi
is this working:
curl -kv https://xxxx:8089
If not then it's probably FW issue. If it works then something else.
Have you anything else on splunkd.log (CM + that new SHC member)?
Have you added it already as SHC member to SHC?
r. Ismo
Hello @isoutamo,
Thanks for the quick response.
curl -kv https://xxx:8089 was successful with all details.
On the master I can see a few things wrong but nothing associated to the new search head.
"WARN DistributedPeerManager - Cannot determine a latest common bundle, search may be blocked
ERROR DigestProcessor - Failed signature match
ERROR LMHttpUtil - Failed to verify HMAC signature, uri: /services/cluster/master/info/?output_mode=json
WARN DistributedPeerManager - Cannot determine a latest common bundle, search may be blocked
ERROR DigestProcessor - Failed signature match"
I have added the search head to the cluster. The initialization of the search head to the cluster went well.
Hi
can you share your server.conf from this new SHC member?
What show shcluster-status —verbose and show I store-status shows?
Are you using automatic replication of search peers on SHC or does you add those manually member by member?
r. Ismo
Hi @isoutamo,
server.conf from new search head:
[sslConfig]
#sslKeysfilePassword = xxx
sslPassword = $xxx
sslVersions = *,-ssl2
sslVersionsForClient = *,-ssl2
cipherSuite = TLSv1+HIGH:TLSv1.2+HIGH:@STRENGTH
[lmpool:auto_generated_pool_download-trial]
description = auto_generated_pool_download-trial
quota = MAX
slaves = *
stack_id = download-trial
[lmpool:auto_generated_pool_forwarder]
description = auto_generated_pool_forwarder
quota = MAX
slaves = *
stack_id = forwarder
[lmpool:auto_generated_pool_free]
description = auto_generated_pool_free
quota = MAX
slaves = *
stack_id = free
[general]
serverName = new search head
pass4SymmKey = xxx
[replication_port://9996]
[shclustering]
conf_deploy_fetch_url = https://search head deployer:8089
disabled = false
mgmt_uri = https://splunksh-p1n01:8089
pass4SymmKey = xxx
replication_factor = 2
manual_detention = off
Splunk show cluster status --verbose
dynamic_captain : 1
elected_captain : Thu Jul 30 22:22:30 2020
initialized_flag : 1
label : splunksh-p1n03
mgmt_uri : https://splunksh-p1n03:8089
min_peers_joined_flag : 1
rolling_restart_flag : 0
service_ready_flag : 1
Members:
splunksh-p1n03
label : splunksh-p1n03
mgmt_uri : https://splunksh-p1n03:8089
mgmt_uri_alias : https://xx:8089
status : Up
splunksh-p1n02
label : splunksh-p1n02
last_conf_replication : Mon Aug 3 07:08:21 2020
mgmt_uri : https://splunksh-p1n02:8089
mgmt_uri_alias : https://xx:8089
status : Up
splunksh-p1n04.
label : splunksh-p1n04
last_conf_replication : Mon Aug 3 07:08:21 2020
mgmt_uri : https://splunksh-p1n04.xx:8089
mgmt_uri_alias : https://xx:8089
status : Up
splunksh-p1n01
label : splunksh-p1
last_conf_replication : Mon Aug 3 07:08:21 2020
mgmt_uri : https://xx8089
mgmt_uri_alias : https://xx:8089
status : Up
The replication factor is set at 2 and is automatic.
Did you have a separate app for clustering config or is it missing?
can you run
splunk btool server list clustering --debug
An app is in place for search head clustering.
Btool Output:
/opt/splunk/etc/apps/cluster_search_base/default/server.conf [clustering]
/opt/splunk/etc/system/default/server.conf access_logging_for_heartbeats = false
/opt/splunk/etc/system/default/server.conf allow_default_empty_p4symmkey = true
/opt/splunk/etc/system/default/server.conf allowed_hbmiss_count = 3
/opt/splunk/etc/system/default/server.conf auto_rebalance_primaries = true
/opt/splunk/etc/system/default/server.conf available_sites =
/opt/splunk/etc/system/default/server.conf backup_and_restore_primaries_in_maintenance = false
/opt/splunk/etc/system/default/server.conf buckets_per_addpeer = 1000
/opt/splunk/etc/system/default/server.conf buckets_to_summarize = primaries
/opt/splunk/etc/system/default/server.conf commit_retry_time = 300
/opt/splunk/etc/system/default/server.conf constrain_singlesite_buckets = true
/opt/splunk/etc/system/default/server.conf cxn_timeout = 60
/opt/splunk/etc/system/default/server.conf decommission_force_finish_idle_time = 0
/opt/splunk/etc/system/default/server.conf decommission_node_force_timeout = 300
/opt/splunk/etc/system/default/server.conf decommission_search_jobs_wait_secs = 180
/opt/splunk/etc/system/default/server.conf deferred_cluster_status_update = true
/opt/splunk/etc/system/default/server.conf enableS2SHeartbeat = true
/opt/splunk/etc/system/default/server.conf executor_workers = 10
/opt/splunk/etc/system/default/server.conf generation_poll_interval = 5
/opt/splunk/etc/system/default/server.conf heartbeat_period = 1
/opt/splunk/etc/system/default/server.conf heartbeat_timeout = 60
/opt/splunk/etc/system/default/server.conf idle_connections_pool_size = -1
/opt/splunk/etc/system/default/server.conf local_executor_workers = 10
/opt/splunk/etc/system/default/server.conf maintenance_mode = false
/opt/splunk/etc/system/default/server.conf manual_detention = off
/opt/splunk/etc/apps/cluster_search_base/default/server.conf master_uri = clustermaster:one
/opt/splunk/etc/system/default/server.conf max_auto_service_interval = 30
/opt/splunk/etc/system/default/server.conf max_fixup_time_ms = 5000
/opt/splunk/etc/system/default/server.conf max_nonhot_rep_kBps = 0
/opt/splunk/etc/system/default/server.conf max_peer_build_load = 2
/opt/splunk/etc/system/default/server.conf max_peer_rep_load = 5
/opt/splunk/etc/system/default/server.conf max_peer_sum_rep_load = 5
/opt/splunk/etc/system/default/server.conf max_peers_to_download_bundle = 5
/opt/splunk/etc/system/default/server.conf max_primary_backups_per_service = 10
/opt/splunk/etc/system/default/server.conf max_replication_errors = 3
/opt/splunk/etc/apps/cluster_search_base/default/server.conf mode = searchhead
/opt/splunk/etc/system/default/server.conf multisite = false
/opt/splunk/etc/system/default/server.conf notify_scan_min_period = 10
/opt/splunk/etc/system/default/server.conf notify_scan_period = 10
/opt/splunk/etc/system/default/server.conf pass4SymmKey =
/opt/splunk/etc/system/default/server.conf percent_peers_to_restart = 10
/opt/splunk/etc/system/default/server.conf quiet_period = 60
/opt/splunk/etc/system/default/server.conf rcv_timeout = 60
/opt/splunk/etc/system/default/server.conf re_add_on_bucket_request_error = false
/opt/splunk/etc/system/default/server.conf rebalance_newgen_propagation_timeout = 60
/opt/splunk/etc/system/default/server.conf rebalance_pipeline_batch_size = 60
/opt/splunk/etc/system/default/server.conf rebalance_primaries_execution_limit_ms = 0
/opt/splunk/etc/system/default/server.conf rebalance_primary_failover_timeout = 75
/opt/splunk/etc/system/default/server.conf rebalance_search_completion_timeout = 180
/opt/splunk/etc/system/default/server.conf rebalance_threshold = 0.90
/opt/splunk/etc/system/default/server.conf register_forwarder_address =
/opt/splunk/etc/system/default/server.conf register_replication_address =
/opt/splunk/etc/system/default/server.conf register_search_address =
/opt/splunk/etc/system/default/server.conf rep_cxn_timeout = 60
/opt/splunk/etc/system/default/server.conf rep_max_rcv_timeout = 180
/opt/splunk/etc/system/default/server.conf rep_max_send_timeout = 180
/opt/splunk/etc/system/default/server.conf rep_rcv_timeout = 60
/opt/splunk/etc/system/default/server.conf rep_send_timeout = 60
/opt/splunk/etc/system/default/server.conf replication_factor = 3
/opt/splunk/etc/system/default/server.conf reporting_delay_period = 30
/opt/splunk/etc/system/default/server.conf restart_timeout = 60
/opt/splunk/etc/system/default/server.conf rolling_restart = restart
/opt/splunk/etc/system/default/server.conf s2sHeartbeatTimeout = 600
/opt/splunk/etc/system/default/server.conf search_factor = 2
/opt/splunk/etc/system/default/server.conf search_files_retry_timeout = 600
/opt/splunk/etc/system/default/server.conf searchable_rebalance = false
/opt/splunk/etc/system/default/server.conf searchable_target_sync_timeout = 60
/opt/splunk/etc/system/default/server.conf searchable_targets = true
/opt/splunk/etc/system/default/server.conf send_timeout = 60
/opt/splunk/etc/system/default/server.conf service_interval = 0
/opt/splunk/etc/system/default/server.conf service_jobs_msec = 100
/opt/splunk/etc/system/default/server.conf site_mappings =
/opt/splunk/etc/system/default/server.conf site_replication_factor = origin:2, total:3
/opt/splunk/etc/system/default/server.conf site_search_factor = origin:1, total:2
/opt/splunk/etc/system/default/server.conf summary_registration_batch_size = 1000
/opt/splunk/etc/system/default/server.conf summary_replication = false
/opt/splunk/etc/system/default/server.conf summary_update_batch_size = 10
/opt/splunk/etc/system/default/server.conf summary_wait_time = 660
/opt/splunk/etc/system/default/server.conf target_wait_time = 150
/opt/splunk/etc/system/default/server.conf throwOnBucketBuildReadError = false
/opt/splunk/etc/system/default/server.conf use_batch_mask_changes = true
/opt/splunk/etc/system/default/server.conf warm_bucket_replication_pre_upload = false
This
/opt/splunk/etc/apps/cluster_search_base/default/server.conf master_uri = clustermaster:one
should be
master_uri = https://<your cm w fqdn>:<mgmt port>
I brought up the captain and can see the /opt/splunk/etc/apps/cluster_search_base/default/server.conf master_uri = clustermaster:one matches on the new search head.
But there is a /opt/splunk/etc/apps/cluster_search_base/local/server.conf that contains a [clustering] stanzay that includes a pass4SymmKey that I was not tracking.
Since this Pass4Symm is in an app is it going to work the same? I input key in plain text and reboot?
The best practices is to use own separate pass4symkeys in every stanzas where it’s possible.
Usually the format clustermaster:xyz is used when your SH is connected to several clusters. And in those cases there must be a separate stanza for that and then that stanza contains it’s pass4symkey.
in your environment this could be a separate app?
So what is output of
splunk btool server list clustermaster:one --debug
Hi @isoutamo ,
Thank you for all the work! Turns out that app contained the pass4symmkey that was tied to the cluster master. On the master the key is in /opt/splunk/etc/system/local/server.conf under clustering stanza.
Thanks again! I'm going to accept the previous reply.