I've got a search head cluster running and have a host that I've set as the cluster captain. Other than the configuration settings for it being the cluster captain it is set up like all the other hosts in the cluster. But I cannot load the web page for it and I can load it for every other search head.
Instead I get a "404" error despite the boot showing that the web interface has come up just fine.
Splunk> Winning the War on Error
Checking http port : open
Checking mgmt port : open
Checking appserver port [127.0.0.1:8065]: open
Checking kvstore port : open
Checking configuration... Done.
Checking critical directories... Done
Validated: _audit _internal _introspection _telemetry _thefishbucket history main summary
Checking filesystem compatibility... Done
Checking conf files for problems...
Checking default conf files for edits...
Validating installed files against hashes from '/opt/splunk/splunk-220.127.116.11-7651b7244cf2-linux-2.6-x86_64-manifest'
All installed files intact.
Checking replication_port port : open
All preliminary checks passed.
Starting splunk server daemon (splunkd)...
[ OK ]
Waiting for web server at https://127.0.0.1:4443 to be available............. Done
If you get stuck, we're here to help.
Look for answers here: http://docs.splunk.com
The Splunk web interface is at https://splunk-search-lead:4443
Any idea why this would be happening? I've deleted the installation twice and reinstalled from scratch and the same outcome happens each time. This is the first time using the latest version of Splunk. Our previous installations ran 7.1.2 and we never had this problem at all using the same deployment steps. This is also on new hosts so there's no previous installation on these hosts either.
Here are the things I would check.
1) Does this search have IPTables or Firewalld running? If so, Ensure the proper ports are open (replication port, web port, splunkd, etc. )
2) Is this captain able to talk to the other search heads?
3) Search index=_internal host= For errors that are reporting in.
Usually when I run into this problem it is either the software firewall on the OS or a firewall rule on the network that is blocking the traffic.
As you can see from the output I posted in the original question all the ports are open and there's no firewall block it. The search captain can connect to all the other hosts although I don't know why that would have any bearing on a 404 error when trying to load the search captain web ui. No hosts are reporting errors either. Running the Splunk Monitoring heatlhcheck shows everything is fine.
What happens when you migrate the captain to another search head in the cluster? Does this instance start working and the 404 issue move to the new captain? I general I've usually gotten 404s when an app has been removed or the user I'm logged in as doesn't have permissions to the objects trying to be viewed.
I should have mentioned that I did try that and the captaincy doesn't seem to be the issue. Any other host set as the captain will display the web interface just fine.
However I have just see that I'm getting
Changes from the other members are not replicating to this member, and changes on this member are not replicating to other members. Consider performing a destructive configuration resync on this search head cluster member
I did the resync and then after a while restarted splunk but still no web page.