Deployment Architecture

Cluster master register_replication_address Invalid hostname when adding peer nodes?

ajiwanand
Path Finder

I am trying to use invalid_replication_address to tell a cluster master running in front of an ELB to contact the indexer on a different address. However when i try to add the peer I get the following error on the CM:
REST_Calls - app=search POST cluster/master/peers/ id=526D8BF5-7412-4934-AC47-08C699290CC9: active_bundle_id -> [14310A4AABD23E85BBD4559C4A3B59F8], add_type -> [Initial-Add], base_generatio
n_id -> [0], batch_serialno -> [1], batch_size -> [2], buckets -> [], forwarderdata_rcv_port -> [9997], forwarderdata_use_ssl -> [0], indexes -> [], last_complete_generation_id -> [0], latest_bundle_id -> [14310A4AABD23E85BBD
4559C4A3B59F8], mgmt_port -> [8089], register_forwarder_address -> [], register_replication_address -> [https://10.0.7.181:8089], register_search_address -> [], replication_port -> [9887], replication_use_ssl -> [0], replicat
ions -> [], server_name -> [ip-10-0-7-181.ca-central-1.compute.internal], site -> [default], splunk_version -> [8.0.2], splunkd_build_number -> [a7f645ddaf91], status -> [Up]
INFO AdminManager - Setting capability.write=edit_indexer_cluster for handler clustermasterpeers.
INFO AdminManager - Setting capability.read=edit_indexer_cluster for handler clustermasterpeers.
DEBUG AdminManager - Validating argument values...
DEBUG AdminManagerValidation - Validating rule='validate(len(name) < 1024, 'Parameter "name" must be less than 1024 characters.')' for arg='name'.
ERROR ClusterMasterPeerHandler - Invalid host name https://10.0.7.181:8089
DEBUG AdminManager - URI /services/cluster/master/peers/?output_mode=json generated an AdminManagerExceptionBase exception in handler 'clustermasterpeers': Invalid host name https://10.0.7.181:80
89


INFO CMSlave - event=addPeer status=failure shutdown=false request: AddPeerRequest: { _id= _indexVec=''active_bundle_id=14310A4AABD23E85BBD4559C4A3B59F8 add_type=Initial-Add base_generation_id=0 batch_serialno=1 batch_size=2 forwarderdata_rcv_port=9997 forwarderdata_use_ssl=0 last_complete_generation_id=0 latest_bundle_id=14310A4AABD23E85BBD4559C4A3B59F8 mgmt_port=8089 name=526D8BF5-7412-4934-AC47
08C699290CC9 register_forwarder_address= register_replication_address=https://10.0.7.181:8089 register_search_address= replication_port=9887 replication_use_ssl=0 replications= server_name=ip-10-0-7-181.ca-central-1 compute.internal site=default splunk_version=8.0.2 splunkd_build_number=a7f645ddaf91 status=Up }
04-23-2020 02:03:56.478 +0000 ERROR CMSlave - event=addPeer start over and retry after sleep 12800ms reason addType=Initial Add Batch SN=1/2 failed. add_peer_network_ms=5

Notice how it says something regarding the name being less than 1024 characters and it possibly failing validation?
The Cluster Master can "resolve" the IP ..although its an IP so see no reason why it should resolve it although the "null" cant resolve is weird.. I added a hostfile..no diffference:
`
nslookup: can't resolve '(null)'

Name: 10.0.7.181
Address 1: 10.0.7.181 ip-10-0-7-181.ca-central-1.compute.internal

The Clustermaster can reach the Indexer on that port:
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.0.7.181:8089.
`

Any reason why this happens?
I've read a few posts and register_replication_address seems to be the solution to my problem however i am unsure why it is "unable to resolve hostname"

*** UPDATE ***
I also want to add here i've been doing more testing on some nodes that are just two EC2 instances with all traffic allowed between each other. nslookup on AWS for the IPs are fine and I still cannot get this working. If i remove register_replication_address in these cases it will work fine..this is really weird. Im not sure what the issue is or how to troubleshoot further if the log just says "invalid hostname"

Labels (1)
0 Karma
1 Solution

scelikok
SplunkTrust
SplunkTrust

I think your problem is because of the wrong value on the "register_replication_address" parameter which seems "https://10.0.7.181:8089" in your error logs. This parameter must be an IP address or fully qualified machine/domain name.

You can test with below setting on server.conf

register_replication_address = 10.0.7.181

If you are using indexer discovery you will need to set "register_forwarder_address" too.

If this reply helps you an upvote and "Accept as Solution" is appreciated.

View solution in original post

scelikok
SplunkTrust
SplunkTrust

I think your problem is because of the wrong value on the "register_replication_address" parameter which seems "https://10.0.7.181:8089" in your error logs. This parameter must be an IP address or fully qualified machine/domain name.

You can test with below setting on server.conf

register_replication_address = 10.0.7.181

If you are using indexer discovery you will need to set "register_forwarder_address" too.

If this reply helps you an upvote and "Accept as Solution" is appreciated.

ajiwanand
Path Finder

omg that is exactly the problem!!
I cant believe its so dumb lol.. the second question I guess would be how can we change the mgmt port from 8089, like if i wanna use 443 only for communication with the indexer is that possible?

I will accept this as the answer btw, this is definitely the solution.

0 Karma

dxu_splunk
Splunk Employee
Splunk Employee

The CM uses "register_replication_address" to communicate with the indexers. When an indexer attempts to join, the CM will ping the indexers IP, or "register_replication_address" if "register_replication_address" is set. This is a check that the two-way communication between CM <--> Indexers work before allowing an Indexer in.

If its not reachable, the indexer is not allowed in.

0 Karma

ajiwanand
Path Finder

I added security groups to allow all traffic (CM wasnt alowed to ping indexer before) but still..same error..

0 Karma

ajiwanand
Path Finder

Also by two way communication you mean ICMP or communication on the specified port for the register replication address?
Replication port has nothing to do with this right? That's the port the indexers will use to communicate with each other for replication?

0 Karma

dxu_splunk
Splunk Employee
Splunk Employee

whatever you put in "register_replication_address" is exactly how the entire cluster will communicate to that indexer. make sure its valid - see if u can telnet to that address and port...

0 Karma

ajiwanand
Path Finder

The CM can reach out to the indexer on that port and even list APIs when i do curl.. when i do a tcpdump im not even seeing the connections from the cluster master..
`curl https://10.0.7.181:8089 -k

https://10.0.7.181:8089/
2020-04-23T22:56:35+00:00

<name>Splunk</name>


<title>rpc</title>
<id>https://10.0.7.181:8089/rpc</id>
<updated>1970-01-01T00:00:00+00:00</updated>
<link href="/rpc" rel="alternate"/>

`

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...