Deployment Architecture

Deploying onto Search Head pool sometimes causes errors

Muryoutaisuu
Communicator

Hi guys

We are testing the Search Head pooling functionality. We have one dedicated deployer and 5 searchhead clustermembers. To deploy we execute following command:

splunk apply shcluster-bundle --answer-yes -target https://[MEMBER_HOSTNAME]:8089 -auth [SPLUNK_USER]:[PW]

Sometimes it works good. Sometimes not and then it has different errors.

Error Nr. 1:

Error while deploying apps to target=https://[MEMBER_HOSTNAME]:8089 with members=5: ConfDeploymentException: Error while updating app=XXX on target=https://[MEMBER_IP]:8089: Non-200/201 status_pre=500; {"messages":[{"type":"ERROR","text":"\n In handler 'localapps': Error during app install: failed to extract app from /appl/splunk/var/run/splunk/bundle_tmp/2753df224a95e6e5.bundle to /appl/splunk/var/run/splunk/bundle_tmp/1074b24058b88cde: No such file or directory"}]}

Error Nr. 2:

Error while deploying apps to first member: ConfDeploymentException: Error while fetching apps baseline on target=https://[MEMBER_IP]:8089: Network-layer error: Connection reset by peer

Error Nr. 3:

Error while deploying apps to target=https://[MEMBER_HOSTNAME]:8089 with members=5: ConfDeploymentException: Error while fetching apps baseline on target=https://[MEMBER_IP]:8089: Network-layer error: Connection reset by peer

Error Nr. 4:

Error when getting master uri from target to do a rolling-restart Error connecting:  Connection refused

What astonishes me, what I do not understand, is: why does it sometimes work, and sometimes not?
Sometimes I have to execute the deploy command more than 5 times consecutively! It begins to annoy me.

Does somebody experience the same? Or does somebody even have a solution, or an explanation?
Thanks
- Muryoutaisuu

0 Karma

Muryoutaisuu
Communicator

We experienced that some of the errors happened when deploying twice too fast. The search heads were still restarting or executing post-start tasks. The error messages here are a bit misleading.
However, I can't recall anymore which ones of the four errors occurred in such a case and whether the issue still exists on first deployment try.

vgunti
Engager

Following error will be the wrong configuration in server.conf, double check the property of "mgmt_uri"

mgmt_uri = https://:

Error when getting master uri from target to do a rolling-restart Service Unavailable

rstrong30
New Member

Sounds like you have a firewall/network problem. I'm consistently getting Error number 4 from your list above.

0 Karma

kindlund
New Member

Nope. I'm getting the same errors also. It's not a firewall problem, as the systems are all directly connected.

The error I'm getting is:

/opt/splunk/bin/splunk apply shcluster-bundle -target https://:8089
Warning: Depending on the configuration changes being pushed, this command might initiate a rolling restart of the cluster members. Please refer to the documentation for the details. Do you wish to continue? [y/n]: y
Error when getting master uri from target to do a rolling-restart Service Unavailable

0 Karma

napomokoetle
Communicator

I upgraded from Splunk Enterprise 6.2.5 yo 6.3 on Linux Centos 6.5

The error I get on the cluster master splunkd.log when trying to run...

[root@ClusterMaster ~]# splunk apply shcluster-bundle --answer-yes -target https://10.zz.yyy.x:8089 -auth admin:adminPasss

is

09-25-2015 15:40:31.078 +0200 WARN AppsDeployHandler - Error while fetching members from uri=https://10.zz.yyy.x:8089: Non-200 status_code=503: Service Unavailable

Please help resolve!

0 Karma

Muryoutaisuu
Communicator

We do not have any firewalls between the servers. Nor do we have problems with network.
On the second search head cluster I do not have any troubles. I suggest that is because we do not have much data that needs deployment there. Perhaps the network load used for deployment causes those strange errors...

0 Karma