Deployment Architecture

Peers doesn't come back live after the restart signal from Master. Why?

GrishmaG
New Member

Maintenance mode was enabled in Cluster Master. All the peer nodes were started. Maintenance mode was disabled.

The indexer bucket creation was in progress, the peer node receives restart signal from the master node.
The peer node starts the process of restart, gets shutdown but doesn't come back live again.

Splunkd Logs from one of the Peer node:

12-15-2017 10:03:59.760 +0000 INFO DatabaseDirectoryManager - idx=_internal Writing a bucket manifest in hotWarmPath='/cache1/splunkdata/_internaldb/db', pendingBucetUpdates=1 . Reason='Updating manifest: bucketUpdates=1'
12-15-2017 10:03:59.762 +0000 INFO DatabaseDirectoryManager - Finished writing bucket manifest in hotWarmPath=/cache1/splunkdata/_internaldb/db
12-15-2017 10:04:00.194 +0000 INFO CMSlave - master has instructed peer to restart
12-15-2017 10:04:00.425 +0000 INFO CMSlave - detected restart is required, will restart node
12-15-2017 10:04:00.739 +0000 INFO PipelineComponent - Performing early shutdown tasks
12-15-2017 10:04:00.740 +0000 INFO IndexProcessor - handleSignal : Disabling streaming searches.
12-15-2017 10:04:00.740 +0000 INFO IndexProcessor - request state change from=RUN to=SHUTDOWN_SIGNALED
12-15-2017 10:04:05.749 +0000 INFO loader - Shutdown HTTPDispatchThread
12-15-2017 10:04:05.749 +0000 INFO ShutdownHandler - Shutting down splunkd
12-15-2017 10:04:05.749 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevel_Begin"
12-15-2017 10:04:05.749 +0000 INFO CMSlave - shutdown initiated restart=1
12-15-2017 10:04:05.749 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevel_JustBeforeKVStore"
12-15-2017 10:04:05.750 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevel_KVStore"
12-15-2017 10:04:07.070 +0000 ERROR MongodRunner - Did not get EOF from mongod after 1 second(s).
12-15-2017 10:04:07.070 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevel_Thruput"
12-15-2017 10:04:07.070 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevel_TcpInput1"
12-15-2017 10:04:07.070 +0000 INFO TcpInputProc - Running shutdown level 1. Closing listening ports.
12-15-2017 10:04:07.070 +0000 INFO TcpInputProc - Shutting down listening ports
12-15-2017 10:04:07.070 +0000 INFO TcpInputProc - Stopping IPv4 port 9997

0 Karma
1 Solution

strive
Influencer

If you are using unit files for splunk service, then check the value that is set for Restart option.

If the value is on-failure, then it wont be restarted upon clean exit. The service will be restarted when the process exits with a non-zero exit code, is terminated by a signal (including on core dump, but excluding the aforementioned four signals), when an operation (such as service reload) times out, and when the configured watchdog timeout is triggered.

Trying setting it to always and see what happens.

If you are going to use always then ensure that ExecStop=/opt/splunk/bin/splunk stop is also part of your unit file.

View solution in original post

strive
Influencer

If you are using unit files for splunk service, then check the value that is set for Restart option.

If the value is on-failure, then it wont be restarted upon clean exit. The service will be restarted when the process exits with a non-zero exit code, is terminated by a signal (including on core dump, but excluding the aforementioned four signals), when an operation (such as service reload) times out, and when the configured watchdog timeout is triggered.

Trying setting it to always and see what happens.

If you are going to use always then ensure that ExecStop=/opt/splunk/bin/splunk stop is also part of your unit file.

Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...