Solved: Re: New Search head unable to communicate with mas...

walsborn · ‎08-03-2020

I'm adding a new search head to my cluster and keep receiving the error: CMSearchHead - 'Unable to reach master' for master=https://xxx:8089.

I'm running version 7.3.4, search head cluster 3, trying to make it 4. 6 indexers, clustered. I have a deployment server and search head deployment server. The cluster master acts as the master for all except forwarders. The nodes are sitting behind and F5 load balancer, all nodes in the same pool and are communicating. The new search head can communicate with indexers, but not the master. Though I can see where the search head has checked in and details are in the master.

Initialization of the search head is good, pass4symmkeys have been replaced throughout the environment to all match.

I have changed CMSearchHead logging to DEBUG and I get no other additional information about the error.

Any great ideas to try?

isoutamo · ‎08-03-2020

The best practices is to use own separate pass4symkeys in every stanzas where it’s possible.

Usually the format clustermaster:xyz is used when your SH is connected to several clusters. And in those cases there must be a separate stanza for that and then that stanza contains it’s pass4symkey.

in your environment this could be a separate app?

So what is output of

splunk btool server list clustermaster:one --debug

View solution in original post

isoutamo · ‎08-03-2020

Hi

is this working:

curl -kv https://xxxx:8089

If not then it's probably FW issue. If it works then something else.

Have you anything else on splunkd.log (CM + that new SHC member)?

Have you added it already as SHC member to SHC?

r. Ismo

walsborn · ‎08-03-2020

Hello @isoutamo,

Thanks for the quick response.

curl -kv https://xxx:8089 was successful with all details.

On the master I can see a few things wrong but nothing associated to the new search head.

"WARN DistributedPeerManager - Cannot determine a latest common bundle, search may be blocked

ERROR DigestProcessor - Failed signature match

ERROR LMHttpUtil - Failed to verify HMAC signature, uri: /services/cluster/master/info/?output_mode=json

WARN DistributedPeerManager - Cannot determine a latest common bundle, search may be blocked

ERROR DigestProcessor - Failed signature match"

I have added the search head to the cluster. The initialization of the search head to the cluster went well.

isoutamo · ‎08-03-2020

Hi

can you share your server.conf from this new SHC member?

What show shcluster-status —verbose and show I store-status shows?

Are you using automatic replication of search peers on SHC or does you add those manually member by member?

r. Ismo

walsborn · ‎08-03-2020

Hi @isoutamo,

server.conf from new search head:

[sslConfig]

#sslKeysfilePassword = xxx

sslPassword = $xxx

sslVersions = *,-ssl2

sslVersionsForClient = *,-ssl2

cipherSuite = TLSv1+HIGH:TLSv1.2+HIGH:@STRENGTH

[lmpool:auto_generated_pool_download-trial]

description = auto_generated_pool_download-trial

quota = MAX

slaves = *

stack_id = download-trial

[lmpool:auto_generated_pool_forwarder]

description = auto_generated_pool_forwarder

quota = MAX

slaves = *

stack_id = forwarder

[lmpool:auto_generated_pool_free]

description = auto_generated_pool_free

quota = MAX

slaves = *

stack_id = free

[general]

serverName = new search head

pass4SymmKey = xxx

[replication_port://9996]

[shclustering]

conf_deploy_fetch_url = https://search head deployer:8089

disabled = false

mgmt_uri = https://splunksh-p1n01:8089

pass4SymmKey = xxx

replication_factor = 2

manual_detention = off

Splunk show cluster status --verbose

dynamic_captain : 1

elected_captain : Thu Jul 30 22:22:30 2020

initialized_flag : 1

label : splunksh-p1n03

mgmt_uri : https://splunksh-p1n03:8089

min_peers_joined_flag : 1

rolling_restart_flag : 0

service_ready_flag : 1

Members:

splunksh-p1n03

label : splunksh-p1n03

mgmt_uri : https://splunksh-p1n03:8089

mgmt_uri_alias : https://xx:8089

status : Up

splunksh-p1n02

label : splunksh-p1n02

last_conf_replication : Mon Aug 3 07:08:21 2020

mgmt_uri : https://splunksh-p1n02:8089

mgmt_uri_alias : https://xx:8089

status : Up

splunksh-p1n04.

label : splunksh-p1n04

last_conf_replication : Mon Aug 3 07:08:21 2020

mgmt_uri : https://splunksh-p1n04.xx:8089

mgmt_uri_alias : https://xx:8089

status : Up

splunksh-p1n01

label : splunksh-p1

last_conf_replication : Mon Aug 3 07:08:21 2020

mgmt_uri : https://xx8089

mgmt_uri_alias : https://xx:8089

status : Up

The replication factor is set at 2 and is automatic.

isoutamo · ‎08-03-2020

Did you have a separate app for clustering config or is it missing?

can you run

splunk btool server list clustering --debug

walsborn · ‎08-03-2020

An app is in place for search head clustering.

Btool Output:

/opt/splunk/etc/apps/cluster_search_base/default/server.conf [clustering]

/opt/splunk/etc/system/default/server.conf access_logging_for_heartbeats = false

/opt/splunk/etc/system/default/server.conf allow_default_empty_p4symmkey = true

/opt/splunk/etc/system/default/server.conf allowed_hbmiss_count = 3

/opt/splunk/etc/system/default/server.conf auto_rebalance_primaries = true

/opt/splunk/etc/system/default/server.conf available_sites =

/opt/splunk/etc/system/default/server.conf backup_and_restore_primaries_in_maintenance = false

/opt/splunk/etc/system/default/server.conf buckets_per_addpeer = 1000

/opt/splunk/etc/system/default/server.conf buckets_to_summarize = primaries

/opt/splunk/etc/system/default/server.conf commit_retry_time = 300

/opt/splunk/etc/system/default/server.conf constrain_singlesite_buckets = true

/opt/splunk/etc/system/default/server.conf cxn_timeout = 60

/opt/splunk/etc/system/default/server.conf decommission_force_finish_idle_time = 0

/opt/splunk/etc/system/default/server.conf decommission_node_force_timeout = 300

/opt/splunk/etc/system/default/server.conf decommission_search_jobs_wait_secs = 180

/opt/splunk/etc/system/default/server.conf deferred_cluster_status_update = true

/opt/splunk/etc/system/default/server.conf enableS2SHeartbeat = true

/opt/splunk/etc/system/default/server.conf executor_workers = 10

/opt/splunk/etc/system/default/server.conf generation_poll_interval = 5

/opt/splunk/etc/system/default/server.conf heartbeat_period = 1

/opt/splunk/etc/system/default/server.conf heartbeat_timeout = 60

/opt/splunk/etc/system/default/server.conf idle_connections_pool_size = -1

/opt/splunk/etc/system/default/server.conf local_executor_workers = 10

/opt/splunk/etc/system/default/server.conf maintenance_mode = false

/opt/splunk/etc/system/default/server.conf manual_detention = off

/opt/splunk/etc/apps/cluster_search_base/default/server.conf master_uri = clustermaster:one

/opt/splunk/etc/system/default/server.conf max_auto_service_interval = 30

/opt/splunk/etc/system/default/server.conf max_fixup_time_ms = 5000

/opt/splunk/etc/system/default/server.conf max_nonhot_rep_kBps = 0

/opt/splunk/etc/system/default/server.conf max_peer_build_load = 2

/opt/splunk/etc/system/default/server.conf max_peer_rep_load = 5

/opt/splunk/etc/system/default/server.conf max_peer_sum_rep_load = 5

/opt/splunk/etc/system/default/server.conf max_peers_to_download_bundle = 5

/opt/splunk/etc/system/default/server.conf max_primary_backups_per_service = 10

/opt/splunk/etc/system/default/server.conf max_replication_errors = 3

/opt/splunk/etc/apps/cluster_search_base/default/server.conf mode = searchhead

/opt/splunk/etc/system/default/server.conf multisite = false

/opt/splunk/etc/system/default/server.conf notify_scan_min_period = 10

/opt/splunk/etc/system/default/server.conf notify_scan_period = 10

/opt/splunk/etc/system/default/server.conf pass4SymmKey =

/opt/splunk/etc/system/default/server.conf percent_peers_to_restart = 10

/opt/splunk/etc/system/default/server.conf quiet_period = 60

/opt/splunk/etc/system/default/server.conf rcv_timeout = 60

/opt/splunk/etc/system/default/server.conf re_add_on_bucket_request_error = false

/opt/splunk/etc/system/default/server.conf rebalance_newgen_propagation_timeout = 60

/opt/splunk/etc/system/default/server.conf rebalance_pipeline_batch_size = 60

/opt/splunk/etc/system/default/server.conf rebalance_primaries_execution_limit_ms = 0

/opt/splunk/etc/system/default/server.conf rebalance_primary_failover_timeout = 75

/opt/splunk/etc/system/default/server.conf rebalance_search_completion_timeout = 180

/opt/splunk/etc/system/default/server.conf rebalance_threshold = 0.90

/opt/splunk/etc/system/default/server.conf register_forwarder_address =

/opt/splunk/etc/system/default/server.conf register_replication_address =

/opt/splunk/etc/system/default/server.conf register_search_address =

/opt/splunk/etc/system/default/server.conf rep_cxn_timeout = 60

/opt/splunk/etc/system/default/server.conf rep_max_rcv_timeout = 180

/opt/splunk/etc/system/default/server.conf rep_max_send_timeout = 180

/opt/splunk/etc/system/default/server.conf rep_rcv_timeout = 60

/opt/splunk/etc/system/default/server.conf rep_send_timeout = 60

/opt/splunk/etc/system/default/server.conf replication_factor = 3

/opt/splunk/etc/system/default/server.conf reporting_delay_period = 30

/opt/splunk/etc/system/default/server.conf restart_timeout = 60

/opt/splunk/etc/system/default/server.conf rolling_restart = restart

/opt/splunk/etc/system/default/server.conf s2sHeartbeatTimeout = 600

/opt/splunk/etc/system/default/server.conf search_factor = 2

/opt/splunk/etc/system/default/server.conf search_files_retry_timeout = 600

/opt/splunk/etc/system/default/server.conf searchable_rebalance = false

/opt/splunk/etc/system/default/server.conf searchable_target_sync_timeout = 60

/opt/splunk/etc/system/default/server.conf searchable_targets = true

/opt/splunk/etc/system/default/server.conf send_timeout = 60

/opt/splunk/etc/system/default/server.conf service_interval = 0

/opt/splunk/etc/system/default/server.conf service_jobs_msec = 100

/opt/splunk/etc/system/default/server.conf site_mappings =

/opt/splunk/etc/system/default/server.conf site_replication_factor = origin:2, total:3

/opt/splunk/etc/system/default/server.conf site_search_factor = origin:1, total:2

/opt/splunk/etc/system/default/server.conf summary_registration_batch_size = 1000

/opt/splunk/etc/system/default/server.conf summary_replication = false

/opt/splunk/etc/system/default/server.conf summary_update_batch_size = 10

/opt/splunk/etc/system/default/server.conf summary_wait_time = 660

/opt/splunk/etc/system/default/server.conf target_wait_time = 150

/opt/splunk/etc/system/default/server.conf throwOnBucketBuildReadError = false

/opt/splunk/etc/system/default/server.conf use_batch_mask_changes = true

/opt/splunk/etc/system/default/server.conf warm_bucket_replication_pre_upload = false

isoutamo · ‎08-03-2020

This

/opt/splunk/etc/apps/cluster_search_base/default/server.conf master_uri = clustermaster:one
should be

master_uri = https://<your cm w fqdn>:<mgmt port>

walsborn · ‎08-03-2020

I brought up the captain and can see the /opt/splunk/etc/apps/cluster_search_base/default/server.conf master_uri = clustermaster:one matches on the new search head.

But there is a /opt/splunk/etc/apps/cluster_search_base/local/server.conf that contains a [clustering] stanzay that includes a pass4SymmKey that I was not tracking.

Since this Pass4Symm is in an app is it going to work the same? I input key in plain text and reboot?

isoutamo · ‎08-03-2020

The best practices is to use own separate pass4symkeys in every stanzas where it’s possible.

Usually the format clustermaster:xyz is used when your SH is connected to several clusters. And in those cases there must be a separate stanza for that and then that stanza contains it’s pass4symkey.

in your environment this could be a separate app?

So what is output of

splunk btool server list clustermaster:one --debug

walsborn · ‎08-03-2020

Hi @isoutamo ,

Thank you for all the work! Turns out that app contained the pass4symmkey that was tied to the cluster master. On the master the key is in /opt/splunk/etc/system/local/server.conf under clustering stanza.

Thanks again! I'm going to accept the previous reply.

New Search head unable to communicate with master

Linux

search head

Unleash Unified Security and Observability with Splunk Cloud Platform

Enterprise Security Content Update (ESCU) | New Releases

Join the Splunk Developer Program Hackathon: Splunk Build-a-thon!