Deployment Architecture

On a three node SH cluster, why does one member doesn't get up again?

HIBE151
Explorer

Hello together,

we have a 3 node SH-Cluster where one member is not getting up again.
If we want to restart the Splunk daemon it will stuck on the very last task to start the web server.
After a while we are getting a WARNING: web interface does not seem to be available!

On the newly selected captain node I've checked the kv status for the specific host:

configVersion : -1
hostAndPort : <ip>:8191
lastHeartbeat : Mon Apr 15 ....
lastHeartbeatRecv :  ZERO_TIME
lastHeartbeatRecvSec: 0
.
.
.
replicationStatus : Down
uptime : 0

When I search for error logs in the _internal logs I can see following messages in mongod logs:

REPL [ReplicationExecutor] Error in heartbeat request to <own-ip-address>.8191; HostUnreachable: Connection refused
ASIO [NetworkInterfaceASIO-Replication-0] Failed to connect to  <own-ip-address>:8191 - HostUnreachable: Connection refused

Should this ip address be the address of the captain?

splunkd logs doesn't indicate any errors.
For me it seems like the syncronisation of the kv store doesn't work.

I've tried this already, but it didn't help:
https://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/ResyncKVstore

any suggestions? Thanks!

0 Karma

skalliger
Motivator

You can run splunk clean raft on the affected member only, too. See if that helps.
Your two other members are working fine? What are the outputs of splunk show shcluster-status and splunk show kvstore-status on the working members/captain?
In case that only one member is going crazy, I'd suggest simply removing it from the cluster and adding it again after cleaning it if splunk clean raft didn't do the job.

Skalli

0 Karma

HIBE151
Explorer

can I execute the command in the section "fixing the entire cluster" of this link without being worried to break the other nodes of the cluster?:
https://docs.splunk.com/Documentation/Splunk/6.5.2/DistSearch/Handleraftissues#Fix_the_entire_cluste...
Any experience with that?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Agent Mode Engaged! Enchaining Agentic Operations with Splunk AI Assistant 2.0

    Are you ready to transform how your team handles complex data requests? We invite you to our upcoming ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...