Installation

DSP 1.2.1 installation failed to execute phase "/runtime"

sylim_splunk
Splunk Employee
Splunk Employee

I see below log in operation_install showing continuous failure to connect to https://gravity-site.kube-system.svc.cluster.local:3009/healthz.
================
Wed Nov 10 02:40:41 UTC [INFO] [DAPD02] Executing postInstall hook for site:6.1.48.
Created Pod "site-app-post-install-125088-zqsmd" in namespace "kube-system".

Container "post-install-hook" created, current state is "waiting, reason PodInitializing".

Pod "site-app-post-install-125088-zqsmd" in namespace "kube-system", has changed state from "Pending" to "Running".
Container "post-install-hook" changed status from "waiting, reason PodInitializing" to "running".

^[[31m[ERROR]: failed connecting to https://gravity-site.kube-system.svc.cluster.local:3009/healthz
Get https://gravity-site.kube-system.svc.cluster.local:3009/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
^[[0mContainer "post-install-hook" changed status from "running" to "terminated, exit code 255".

Container "post-install-hook" restarted, current state is "running".

^[[31m[ERROR]: failed connecting to https://gravity-site.kube-system.svc.cluster.local:3009/healthz
Get https://gravity-site.kube-system.svc.cluster.local:3009/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
^[[0mContainer "post-install-hook" changed status from "running" to "terminated, exit code 255".

Container "post-install-hook" changed status from "terminated, exit code 255" to "waiting, reason CrashLoopBackOff".
================

The gravity cluster status after the installation failure:
================
[root@DAPD02 crashreport]# gravity status
Cluster name: charmingmeitner2182
Cluster status: degraded (application status check failed)
Application: dsp, version 1.2.1
Gravity version: 6.1.48 (client) / 6.1.48 (server)
Join token: b9b088ce63c0a703ee740ba5dfb380d
Periodic updates: Not Configured
Remote support: Not Configured
Last completed operation:
* 3-node install
ID: 46614e3c-fcd1-4974-8cd7-dc404d1880b
Started: Wed Nov 10 02:33 UTC (1 hour ago)
Completed: Wed Nov 10 02:35 UTC (1 hour ago)
Cluster endpoints:
* Authentication gateway:
- 10.69.80.1:32009
- 10.69.80.2:32009
- 10.69.89.3:32009
* Cluster management URL:
- https://10.69.80.1:32009
- https://10.69.80.2:32009
- https://10.69.89.3:32009
Cluster nodes:
Masters:
* DAPD02 / 10.69.80.1 / master
Status: healthy
[!] overlay packet loss for node 10.69.89.3 is higher than the allowed threshold of 20% (current packet loss at 100%)
[!] overlay packet loss for node 10.69.80.2 is higher than the allowed threshold of 20% (current packet loss at 100%)
Remote access: online
* DWPD03 / 10.69.80.2 / master
Status: healthy
[!] overlay packet loss for node 10.69.80.1 is higher than the allowed threshold of 20% (current packet loss at 100%)
[!] overlay packet loss for node 10.69.89.3 is higher than the allowed threshold of 20% (current packet loss at 100%)
Remote access: online
* DDPD04 / 10.69.89.3 / master
Status: healthy
[!] overlay packet loss for node 10.69.80.2 is higher than the allowed threshold of 20% (current packet loss at 100%)
[!] overlay packet loss for node 10.69.80.1 is higher than the allowed threshold of 20% (current packet loss at 100%)
Remote access: online
================

Labels (2)
Tags (1)
0 Karma
1 Solution

sylim_splunk
Splunk Employee
Splunk Employee

When I configured my test environment with this firewall settings, I could reproduce the same symptom
Then the installation completes successfully with the firewall settings below;


allow 53 for TCP
allow 53 for UDP
allow 8472 for UDP

View solution in original post

0 Karma

sylim_splunk
Splunk Employee
Splunk Employee

When I configured my test environment with this firewall settings, I could reproduce the same symptom
Then the installation completes successfully with the firewall settings below;


allow 53 for TCP
allow 53 for UDP
allow 8472 for UDP

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...