Deployment Architecture

Why cluster slaves' web servers time out on start with my cluster configuration?

BigJiggity
Engager

I have my cluster configuration built… my index master seems to be good, here is the config:

[clustering]
mode = master
multisite = true
available_sites = site1,site2,site3
site_replication_factor = origin:1,total:5
site_search_factor = origin:1,total:5

[license]
master_uri = https://host.domain.local:8089

[general]
pass4SymmKey = whatever
site = site3

The config on my indexers in each site looks like this:

[clustering]
master_uri = https://host.doman.local:8089]
mode = slave

[license]
master_uri = https://host.doman.local:8089

[general]
pass4SymmKey = whatever
serverName = localhost01
site = site3

Obviously with the site change appropriately…. now when I restart the slaves, splunk starts up normally until it gets to starting the web server, then I get this:

Waiting for web server at http://127.0.0.1:8000 to be available............................................................................................................................................................................................................................................................................................................

WARNING: web interface does not seem to be available!

and it’s done this on every slave, if I remove the cluster config, it starts no problem.

amiracle
Splunk Employee
Splunk Employee

I had the same problem but that was because I used http://:8089 instead of https://:8089 when adding members to my search head cluster. The entire command looked like this :

bin/splunk init shcluster-config -auth admin:splunk123 -mgmt_uri https://splunk03:8089 -replication_port 34567 -replication_factor 2 -conf_deploy_fetch_url https://splunk01:8089 -secret splunk123 

I hope that helps.

0 Karma

bevant
Explorer

I believe the OP is talking about an index cluster, not a search head cluster - and I have the exact same problem (latest 6.3).

When I comment out the clustering section, which on a peer only has "master_uri=" and "mode = slave", the splunk instance starts and runs, but as soon as I add them back in, both 8000 and 8089 become unavailable.

The cluster master seems to report that peer being added correctly, but it reports this in the logs constantly:

10-19-2015 02:26:22.870 -0700 ERROR HttpClientRequest - HTTP client error: Read Timeout (while accessing https://192.168.141.130:8089/services/server/info)
10-19-2015 02:26:22.871 -0700 ERROR HttpClientRequest - HTTP client error: Connection reset by peer (while accessing http://192.168.141.130:8089/services/server/info)
10-19-2015 02:26:22.871 -0700 WARN  DistributedPeerManagerHeartbeat - Unable to get server info from peer: http://192.168.141.130:8089 due to: Connection reset by peer
0 Karma

dxu_splunk
Splunk Employee
Splunk Employee

have you tried

bin/splunk set indexing-ready

on the cluster-master? (if that works, its because the cluster master is waiting for N number of indexers to join the cluster before it allows them all to start working, where N is your replication_factor for singlesite or max:N per site for your multisite policy, while set indexing-ready circumvents that wait)

bevant
Explorer

Thanks for the response. Even when I have N number of indexers it doesn't work (even though the cluster is green and indexing), but I tried your suggestion anyway - still doesn't work however.

[edit]: Hmm, perhaps I stand corrected - the events no longer seem to be occurring in the logs, but I still can't wget that URL (and I can when clustering is disabled on peers), so I'm guessing the master has just stopped trying. This is possibly just normal? It's the only operational cluster I have access to at the moment, and it's the simplest one imaginable, with a complete rebuild producing the same result.

0 Karma

luhadia_aditya
Path Finder
  • Have your master up after the upgrade, passing all prompts.
  • Enable maintenance mode on master by issuing cli command - splunk enable maintenance-mode
  • Upgrade the slaves (peer nodes and search heads)
  • Start the peer nodes and search heads
  • Disable maintenance mode on master by issuing cli command - splunk disable maintenance-mode

PS - Make sure while bringing down the slaves, use the splunk stop command instead of splunk offline. Its not going to be a rolling, online upgrade.

Hope this helps!

0 Karma

dxu_splunk
Splunk Employee
Splunk Employee

Is the master started?

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...