Solved: DSP 1.2.1 installation failed to execute phase "/r...

sylim_splunk · ‎11-18-2021

I see below log in operation_install showing continuous failure to connect to https://gravity-site.kube-system.svc.cluster.local:3009/healthz.
================
Wed Nov 10 02:40:41 UTC [INFO] [DAPD02] Executing postInstall hook for site:6.1.48.
Created Pod "site-app-post-install-125088-zqsmd" in namespace "kube-system".

Container "post-install-hook" created, current state is "waiting, reason PodInitializing".

Pod "site-app-post-install-125088-zqsmd" in namespace "kube-system", has changed state from "Pending" to "Running".
Container "post-install-hook" changed status from "waiting, reason PodInitializing" to "running".

^[[31m[ERROR]: failed connecting to https://gravity-site.kube-system.svc.cluster.local:3009/healthz
Get https://gravity-site.kube-system.svc.cluster.local:3009/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
^[[0mContainer "post-install-hook" changed status from "running" to "terminated, exit code 255".

Container "post-install-hook" restarted, current state is "running".

^[[31m[ERROR]: failed connecting to https://gravity-site.kube-system.svc.cluster.local:3009/healthz
Get https://gravity-site.kube-system.svc.cluster.local:3009/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
^[[0mContainer "post-install-hook" changed status from "running" to "terminated, exit code 255".

Container "post-install-hook" changed status from "terminated, exit code 255" to "waiting, reason CrashLoopBackOff".
================

The gravity cluster status after the installation failure:
================
[root@DAPD02 crashreport]# gravity status
Cluster name: charmingmeitner2182
Cluster status: degraded (application status check failed)
Application: dsp, version 1.2.1
Gravity version: 6.1.48 (client) / 6.1.48 (server)
Join token: b9b088ce63c0a703ee740ba5dfb380d
Periodic updates: Not Configured
Remote support: Not Configured
Last completed operation:
* 3-node install
ID: 46614e3c-fcd1-4974-8cd7-dc404d1880b
Started: Wed Nov 10 02:33 UTC (1 hour ago)
Completed: Wed Nov 10 02:35 UTC (1 hour ago)
Cluster endpoints:
* Authentication gateway:
- 10.69.80.1:32009
- 10.69.80.2:32009
- 10.69.89.3:32009
* Cluster management URL:
- https://10.69.80.1:32009
- https://10.69.80.2:32009
- https://10.69.89.3:32009
Cluster nodes:
Masters:
* DAPD02 / 10.69.80.1 / master
Status: healthy
[!] overlay packet loss for node 10.69.89.3 is higher than the allowed threshold of 20% (current packet loss at 100%)
[!] overlay packet loss for node 10.69.80.2 is higher than the allowed threshold of 20% (current packet loss at 100%)
Remote access: online
* DWPD03 / 10.69.80.2 / master
Status: healthy
[!] overlay packet loss for node 10.69.80.1 is higher than the allowed threshold of 20% (current packet loss at 100%)
[!] overlay packet loss for node 10.69.89.3 is higher than the allowed threshold of 20% (current packet loss at 100%)
Remote access: online
* DDPD04 / 10.69.89.3 / master
Status: healthy
[!] overlay packet loss for node 10.69.80.2 is higher than the allowed threshold of 20% (current packet loss at 100%)
[!] overlay packet loss for node 10.69.80.1 is higher than the allowed threshold of 20% (current packet loss at 100%)
Remote access: online
================

sylim_splunk · ‎11-18-2021

When I configured my test environment with this firewall settings, I could reproduce the same symptom
Then the installation completes successfully with the firewall settings below;

allow 53 for TCP
allow 53 for UDP
allow 8472 for UDP

View solution in original post

sylim_splunk · ‎11-18-2021

When I configured my test environment with this firewall settings, I could reproduce the same symptom
Then the installation completes successfully with the firewall settings below;

allow 53 for TCP
allow 53 for UDP
allow 8472 for UDP

DSP 1.2.1 installation failed to execute phase "/runtime"

Linux

other

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

Monitoring Amazon Elastic Kubernetes Service (EKS)