All Apps and Add-ons

topics/models in UBA have gone into a pending state even after stop-all and start-all

dchoi_splunk
Splunk Employee
Splunk Employee

After running a stop-all & start-all to restart, various topics/models have gone into a pending state and aren't getting data. It also appears that the data source connectors are no longer getting data as there are no EPS being displayed on the home page, and the number of processed events does not appear to be growing.
A stop-all start-all has been done a number of times since the issue started to try and rectify the problem.
What should I look into it further?

0 Karma

dchoi_splunk
Splunk Employee
Splunk Employee

If you see below errors from the output of health check scripts,

Errors:

kubelet journalctl:

kubelet_node_status.go:273] Setting node annotation to enable volume controller attach/detach
kubelet_node_status.go:82] Attempting to register node YOUR_FQDN_SERVERNAME
kubelet_node_status.go:106] Unable to register node "YOUR_FQDN_SERVERNAME" with API server: nodes "YOUR_FQDN_SERVERNAME" is forbidden: node "short_hostname" cannot modify node "YOUR_FQDN_SERVERNAME"
kubelet_node_status.go:273] Setting node annotation to enable volume controller attach/detach
kubelet_node_status.go:82] Attempting to register node YOUR_FQDN

Output of health check:

concern summary: Checking YOUR_FQDN_SERVERNAME first ...

                                                         McAfee_ePO           | Splunk | Processing |       | SPLUNK/DIRECT     | 2019-01-15 05:18:10 | 0          | 0           | 0       |      949874 |                 0 |             485746 |           49 |    <== significant failed/skipped events; review datasource SPL
                              eps:                      0   <== no response from redis 'YOUR_FQDN'
                              status not OK ...            <== check UI system health monitor for errors
                                                        splunkuba     analyticsaggregator-rc-v6csh             0/1       Pending   0          14m       <none>          <none>   <== pod 'analyticsaggregator-rc-v6csh' is 'Pending'; not 'Running'
                                                        splunkuba     analyticsviewsbuilder-rc-nwffg           0/1       Pending   0          14m       <none>          <none>   <== pod 'analyticsviewsbuilder-rc-nwffg' is 'Pending'; not 'Running'
                                                        splunkuba     analyticswriter-rc-jscjf                 0/1       Pending   0          14m       <none>          <none>   <== pod 'analyticswriter-rc-jscjf' is 'Pending'; not 'Running'
                                                        splunkuba     anomalyaggregationmodel-rc-j968d         0/1       Pending   0          14m       <none>          <none>   <== pod 'anomalyaggregationmodel-rc-j968d' is 'Pending'; not 'Running'
                                                        splunkuba     devicetopic-modelgroup01-rc-d5ljq        0/1       Pending   0          14m       <none>          <none>   <== pod 'devicetopic-modelgroup01-rc-d5ljq' is 'Pending'; not 'Running'
                                                        splunkuba     devicetopic-modelgroup01-rc-z5mzc        0/1       Pending   0          14m       <none>          <none>   <== pod 'devicetopic-modelgroup01-rc-z5mzc' is 'Pending'; not 'Running'
                                                        splunkuba     domaintopic-modelgroup01-rc-8pvcx        0/1       Pending   0          14m       <none>          <none>   <== pod 'domaintopic-modelgroup01-rc-8pvcx' is 'Pending'; not 'Running'
                                                        splunkuba     domaintopic-modelgroup01-rc-mmzhw        0/1       Pending   0          14m       <none>          <none>   <== pod 'domaintopic-modelgroup01-rc-mmzhw' is 'Pending'; not 'Running'

.
.
.

The problem would be the way the hostnames are set with FQDN:
i.e.
hostnamectl status: Static hostname: YOUR_FQDN_SERVERNAME

e.g: /etc/hosts: (on Master Node)

127.0.0.1 localhost
Ip_address YOUR_FQDN_SERVERNAME(e.g uba1.splunk.com)
Ip_address YOUR_FQDN_SERVERNAME(e.g uba2.splunk.com)
Ip_address YOUR_FQDN_SERVERNAME(e.g uba3.splunk.com)

To resolve or rectify the issue:

1.Check the current status of the containers
sudo kubectl --kubeconfig /etc/kubernetes/admin.conf get nodes -o wide --all-namespaces
sudo kubectl --kubeconfig /etc/kubernetes/admin.conf get pods -o wide --all-namespaces

2.Stop containers and services
/opt/caspida/bin/Caspida stop-containers
/opt/caspida/bin/Caspida stop-container-services #command not in 4.1, added in 4.2

3.Check the current status of the containers
sudo kubectl --kubeconfig /etc/kubernetes/admin.conf get nodes -o wide --all-namespaces
sudo kubectl --kubeconfig /etc/kubernetes/admin.conf get pods -o wide --all-namespaces

4.Stop kubelet and docker on all nodes
sudo service kubelet stop && sudo service docker stop
ssh uba1 "sudo service kubelet stop && sudo service docker stop"
ssh uba2 "sudo service kubelet stop && sudo service docker stop"

5.Check the current status of the containers
sudo kubectl --kubeconfig /etc/kubernetes/admin.conf get nodes -o wide --all-namespaces
sudo kubectl --kubeconfig /etc/kubernetes/admin.conf get pods -o wide --all-namespaces

6.Update hostnames to have shortnames
sudo hostnamectl set-hostname uba1
ssh uba2 "sudo hostnamectl set-hostname uba2"
ssh uba3 "sudo hostnamectl set-hostname uba3"

7.Restart kubelet and docker on all nodes
sudo service docker start && sudo service kubelet start
ssh uba2 "sudo service docker start && sudo service kubelet start"
ssh uba3 "sudo service docker start && sudo service kubelet start"

8.Start containers and services
/opt/caspida/bin/Caspida start-container-services #command not in 4.1, added in 4.2
/opt/caspida/bin/Caspida start-containers

9.Check the current status of the containers
sudo kubectl --kubeconfig /etc/kubernetes/admin.conf get nodes -o wide --all-namespaces
sudo kubectl --kubeconfig /etc/kubernetes/admin.conf get pods -o wide --all-namespaces

  1. You'll be able to check the status pending to running now.
0 Karma
Get Updates on the Splunk Community!

Synthetic Monitoring: Not your Grandma’s Polyester! Tech Talk: DevOps Edition

Register today and join TekStream on Tuesday, February 28 at 11am PT/2pm ET for a demonstration of Splunk ...

Instrumenting Java Websocket Messaging

Instrumenting Java Websocket MessagingThis article is a code-based discussion of passing OpenTelemetry trace ...

Announcing General Availability of Splunk Incident Intelligence!

Digital transformation is real! Across industries, companies big and small are going through rapid digital ...