Why cluster slaves' web servers time out on start ...

BigJiggity · ‎11-13-2014

I have my cluster configuration built… my index master seems to be good, here is the config:

[clustering]
mode = master
multisite = true
available_sites = site1,site2,site3
site_replication_factor = origin:1,total:5
site_search_factor = origin:1,total:5

[license]
master_uri = https://host.domain.local:8089

[general]
pass4SymmKey = whatever
site = site3

The config on my indexers in each site looks like this:

[clustering]
master_uri = https://host.doman.local:8089]
mode = slave

[license]
master_uri = https://host.doman.local:8089

[general]
pass4SymmKey = whatever
serverName = localhost01
site = site3

Obviously with the site change appropriately…. now when I restart the slaves, splunk starts up normally until it gets to starting the web server, then I get this:

Waiting for web server at http://127.0.0.1:8000 to be available............................................................................................................................................................................................................................................................................................................

WARNING: web interface does not seem to be available!

and it’s done this on every slave, if I remove the cluster config, it starts no problem.

amiracle · ‎06-11-2015

I had the same problem but that was because I used http://:8089 instead of https://:8089 when adding members to my search head cluster. The entire command looked like this :

bin/splunk init shcluster-config -auth admin:splunk123 -mgmt_uri https://splunk03:8089 -replication_port 34567 -replication_factor 2 -conf_deploy_fetch_url https://splunk01:8089 -secret splunk123

I hope that helps.

bevant · ‎10-19-2015

I believe the OP is talking about an index cluster, not a search head cluster - and I have the exact same problem (latest 6.3).

When I comment out the clustering section, which on a peer only has "master_uri=" and "mode = slave", the splunk instance starts and runs, but as soon as I add them back in, both 8000 and 8089 become unavailable.

The cluster master seems to report that peer being added correctly, but it reports this in the logs constantly:

10-19-2015 02:26:22.870 -0700 ERROR HttpClientRequest - HTTP client error: Read Timeout (while accessing https://192.168.141.130:8089/services/server/info)
10-19-2015 02:26:22.871 -0700 ERROR HttpClientRequest - HTTP client error: Connection reset by peer (while accessing http://192.168.141.130:8089/services/server/info)
10-19-2015 02:26:22.871 -0700 WARN  DistributedPeerManagerHeartbeat - Unable to get server info from peer: http://192.168.141.130:8089 due to: Connection reset by peer

dxu_splunk · ‎10-19-2015

have you tried

bin/splunk set indexing-ready

on the cluster-master? (if that works, its because the cluster master is waiting for N number of indexers to join the cluster before it allows them all to start working, where N is your replication_factor for singlesite or max:N per site for your multisite policy, while set indexing-ready circumvents that wait)

bevant · ‎10-19-2015

Thanks for the response. Even when I have N number of indexers it doesn't work (even though the cluster is green and indexing), but I tried your suggestion anyway - still doesn't work however.

[edit]: Hmm, perhaps I stand corrected - the events no longer seem to be occurring in the logs, but I still can't wget that URL (and I can when clustering is disabled on peers), so I'm guessing the master has just stopped trying. This is possibly just normal? It's the only operational cluster I have access to at the moment, and it's the simplest one imaginable, with a complete rebuild producing the same result.

luhadia_aditya · ‎11-20-2014

Have your master up after the upgrade, passing all prompts.
Enable maintenance mode on master by issuing cli command - splunk enable maintenance-mode
Upgrade the slaves (peer nodes and search heads)
Start the peer nodes and search heads
Disable maintenance mode on master by issuing cli command - splunk disable maintenance-mode

PS - Make sure while bringing down the slaves, use the splunk stop command instead of splunk offline. Its not going to be a rolling, online upgrade.

Hope this helps!

dxu_splunk · ‎11-13-2014

Is the master started?

Why cluster slaves' web servers time out on start with my cluster configuration?

Detecting Brute Force Account Takeover Fraud with Splunk

Buttercup Games: Further Dashboarding Techniques (Part 9)

Buttercup Games: Further Dashboarding Techniques (Part 8)

Are you a member of the Splunk Community?

Why cluster slaves' web servers time out on start with my cluster configuration?

Detecting Brute Force Account Takeover Fraud with Splunk

Buttercup Games: Further Dashboarding Techniques (Part 9)

Buttercup Games: Further Dashboarding Techniques (Part 8)