Deployment Architecture

Running "apply shcluster-bundle" in a search head cluster, why am I getting error "no captain found amongst members"?

rainerzufall
Path Finder

Hello,

Today I modified the /etc/system/local/authentication.conf file on all Search Head Cluster members because most settings should be pushed by the Deployer in a separate app. Authentication still is working fine (local and LDAP)...
Now, when I do a /opt/splunk/bin/splunk apply shcluster-bundle ... I get the following error:

Error while deploying apps to target=https://name.xyz:8089 with members=2: no captain found amongst members

The internal log is as follows:

127.0.0.1 - admin [02/Mar/2016:11:10:35.265 +0000] "POST /services/apps/deploy HTTP/1.1" 500 245 - - - 10529ms

But the SHC looks fine in the Distributed Management Console and I get the following output when checking cluster status on CLI:

name1# /opt/splunk/bin/splunk show shcluster-status

 Captain:
                          dynamic_captain : 1
                          elected_captain : Wed Mar  2 10:48:04 2016
                                       id : B2542A43-0D49-4235-ABAA-6749581BA6DC
                         initialized_flag : 1
                                    label : name1
                         maintenance_mode : 0
                                 mgmt_uri : https://name1.xyz:8089
                    min_peers_joined_flag : 1
                     rolling_restart_flag : 0
                       service_ready_flag : 1

 Members: 
        name2
                                    label : name2
                                 mgmt_uri : https://name2.xyz:8089
                           mgmt_uri_alias : https://1.1.1.2:8089
                                   status : Up
        name3
                                    label : name3
                                 mgmt_uri : https://name3.xyz:8089
                           mgmt_uri_alias : https://1.1.1.3:8089
                                   status : Up

Thanks,
/Rainer

0 Karma
1 Solution

muebel
SplunkTrust
SplunkTrust

Hi Rainer, based on your show cluster-status output, it looks like you are getting this message because the captain is actually not a member of the cluster. While name2 and name3 are present in the members list, name1 is not.

Additionally, I would specifically target the captain when running apply shcluster-bundle command. i.e. name1 instead of name.

I would try a restart of name1 and see if that prompts a re-election and hopefully have name1 join the cluster successfully. Otherwise it looks like you might have a deeper problem with the SHC that would require some assistance from support.

Please let me know if this helps!

View solution in original post

pfender
Explorer

I recently ran into the same issue, captain elected but missing in member list and didn't respond to other members anymore. A reboot helped, but not for long, cluster changed to unstabil pretty quick again. Started digging deeper and found the dispatch directory filling (+150k directories) and reaper didn't clean up, so I/O went up crazy. I identified a RT scheduled search causing splunk (6.5.5) keeping all the rt_scheduler__nobody* directories. A rewrite of the search fixed it. Cleaning the dispatch and cluster was running fine again. I afraid i spotted a possible bug in this version.

muebel
SplunkTrust
SplunkTrust

Hi Rainer, based on your show cluster-status output, it looks like you are getting this message because the captain is actually not a member of the cluster. While name2 and name3 are present in the members list, name1 is not.

Additionally, I would specifically target the captain when running apply shcluster-bundle command. i.e. name1 instead of name.

I would try a restart of name1 and see if that prompts a re-election and hopefully have name1 join the cluster successfully. Otherwise it looks like you might have a deeper problem with the SHC that would require some assistance from support.

Please let me know if this helps!

rainerzufall
Path Finder

I did a reboot of the complete box (splunk restart was not enough) and a new captain was elected. I now see all three nodes as cluster members. Thank aou for the hint!

0 Karma

muebel
SplunkTrust
SplunkTrust

Awesome, glad to hear! 😄

0 Karma

Raghav2384
Motivator

How many search heads do you have in total (including captain?). Is splunkd up on all of them?

Also, when you push authentication.conf, i am assuming you have a the strategy with BIND password on each and every search head as well correct? Sorry if i misread, reason i ask is , you cannot push one copy of LDAP strategy from Deployer where the password is already encrypted. It happened to me once during my new to SHC days.

And like @harsmarvania57 mentioned, name 1 should appear in members list as well.

Assuming you are on latest build, have you tried this
http://docs.splunk.com/Documentation/Splunk/6.3.3/DistSearch/Staticcaptain

Thanks,
Raghav

rainerzufall
Path Finder

There are 3 members total in the cluster and splunkd is up and running on all of them. LDAP config seems to be ok on all devices since I am able to login with the LDAP account when accessing the nodes directly.
I'll try the static captain thing...

0 Karma

rainerzufall
Path Finder

I tried the static captain configuration on the dynamic captain and got the following output:

 In handler 'shclusterconfig': Could not contact captain.  Check that the captain is up, the captain_uri=https://name1:8089 and secret are specified correctly Err : Failure, rc=2: Connect to=https://name1:8089 timed out; exceeded 30sec LowerLevelErrors = SocketError connecting to=name1:8089 WARN: Connect to=name1:8089 timed out; exceeded 30sec

It is extremely strange but after rebooting the complete box (splunkd restart was not enough), a new master was elected and now everything is fine...

0 Karma

harsmarvania57
Ultra Champion

Can you please check why members are showing "name2" and "name3" , "name1" must be in Members as well.

rainerzufall
Path Finder

name1 is not in the members list. How can I check why?

0 Karma
Get Updates on the Splunk Community!

Splunk Platform | Upgrading your Splunk Deployment to Python 3.9

Splunk initially announced the removal of Python 2 during the release of Splunk Enterprise 8.0.0, aiming to ...

From Product Design to User Insights: Boosting App Developer Identity on Splunkbase

co-authored by Yiyun Zhu & Dan Hosaka Engaging with the Community at .conf24 At .conf24, we revitalized the ...

Detect and Resolve Issues in a Kubernetes Environment

We’ve gone through common problems one can encounter in a Kubernetes environment, their impacts, and the ...