Deployment Architecture

Deploying onto Search Head pool sometimes causes errors

Muryoutaisuu
Communicator

Hi guys

We are testing the Search Head pooling functionality. We have one dedicated deployer and 5 searchhead clustermembers. To deploy we execute following command:

splunk apply shcluster-bundle --answer-yes -target https://[MEMBER_HOSTNAME]:8089 -auth [SPLUNK_USER]:[PW]

Sometimes it works good. Sometimes not and then it has different errors.

Error Nr. 1:

Error while deploying apps to target=https://[MEMBER_HOSTNAME]:8089 with members=5: ConfDeploymentException: Error while updating app=XXX on target=https://[MEMBER_IP]:8089: Non-200/201 status_pre=500; {"messages":[{"type":"ERROR","text":"\n In handler 'localapps': Error during app install: failed to extract app from /appl/splunk/var/run/splunk/bundle_tmp/2753df224a95e6e5.bundle to /appl/splunk/var/run/splunk/bundle_tmp/1074b24058b88cde: No such file or directory"}]}

Error Nr. 2:

Error while deploying apps to first member: ConfDeploymentException: Error while fetching apps baseline on target=https://[MEMBER_IP]:8089: Network-layer error: Connection reset by peer

Error Nr. 3:

Error while deploying apps to target=https://[MEMBER_HOSTNAME]:8089 with members=5: ConfDeploymentException: Error while fetching apps baseline on target=https://[MEMBER_IP]:8089: Network-layer error: Connection reset by peer

Error Nr. 4:

Error when getting master uri from target to do a rolling-restart Error connecting:  Connection refused

What astonishes me, what I do not understand, is: why does it sometimes work, and sometimes not?
Sometimes I have to execute the deploy command more than 5 times consecutively! It begins to annoy me.

Does somebody experience the same? Or does somebody even have a solution, or an explanation?
Thanks
- Muryoutaisuu

0 Karma

Muryoutaisuu
Communicator

We experienced that some of the errors happened when deploying twice too fast. The search heads were still restarting or executing post-start tasks. The error messages here are a bit misleading.
However, I can't recall anymore which ones of the four errors occurred in such a case and whether the issue still exists on first deployment try.

vgunti
Engager

Following error will be the wrong configuration in server.conf, double check the property of "mgmt_uri"

mgmt_uri = https://:

Error when getting master uri from target to do a rolling-restart Service Unavailable

rstrong30
Loves-to-Learn

Sounds like you have a firewall/network problem. I'm consistently getting Error number 4 from your list above.

0 Karma

kindlund
New Member

Nope. I'm getting the same errors also. It's not a firewall problem, as the systems are all directly connected.

The error I'm getting is:

/opt/splunk/bin/splunk apply shcluster-bundle -target https://:8089
Warning: Depending on the configuration changes being pushed, this command might initiate a rolling restart of the cluster members. Please refer to the documentation for the details. Do you wish to continue? [y/n]: y
Error when getting master uri from target to do a rolling-restart Service Unavailable

0 Karma

napomokoetle
Communicator

I upgraded from Splunk Enterprise 6.2.5 yo 6.3 on Linux Centos 6.5

The error I get on the cluster master splunkd.log when trying to run...

[root@ClusterMaster ~]# splunk apply shcluster-bundle --answer-yes -target https://10.zz.yyy.x:8089 -auth admin:adminPasss

is

09-25-2015 15:40:31.078 +0200 WARN AppsDeployHandler - Error while fetching members from uri=https://10.zz.yyy.x:8089: Non-200 status_code=503: Service Unavailable

Please help resolve!

Muryoutaisuu
Communicator

We do not have any firewalls between the servers. Nor do we have problems with network.
On the second search head cluster I do not have any troubles. I suggest that is because we do not have much data that needs deployment there. Perhaps the network load used for deployment causes those strange errors...

Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...