Deployment Architecture
Highlighted

Peers doesn't come back live after the restart signal from Master. Why?

New Member

Maintenance mode was enabled in Cluster Master. All the peer nodes were started. Maintenance mode was disabled.

The indexer bucket creation was in progress, the peer node receives restart signal from the master node.
The peer node starts the process of restart, gets shutdown but doesn't come back live again.

Splunkd Logs from one of the Peer node:

12-15-2017 10:03:59.760 +0000 INFO DatabaseDirectoryManager - idx=internal Writing a bucket manifest in hotWarmPath='/cache1/splunkdata/internaldb/db', pendingBucetUpdates=1 . Reason='Updating manifest: bucketUpdates=1'
12-15-2017 10:03:59.762 +0000 INFO DatabaseDirectoryManager - Finished writing bucket manifest in hotWarmPath=/cache1/splunkdata/internaldb/db
12-15-2017 10:04:00.194 +0000 INFO CMSlave - master has instructed peer to restart
12-15-2017 10:04:00.425 +0000 INFO CMSlave - detected restart is required, will restart node
12-15-2017 10:04:00.739 +0000 INFO PipelineComponent - Performing early shutdown tasks
12-15-2017 10:04:00.740 +0000 INFO IndexProcessor - handleSignal : Disabling streaming searches.
12-15-2017 10:04:00.740 +0000 INFO IndexProcessor - request state change from=RUN to=SHUTDOWN
SIGNALED
12-15-2017 10:04:05.749 +0000 INFO loader - Shutdown HTTPDispatchThread
12-15-2017 10:04:05.749 +0000 INFO ShutdownHandler - Shutting down splunkd
12-15-2017 10:04:05.749 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevelBegin"
12-15-2017 10:04:05.749 +0000 INFO CMSlave - shutdown initiated restart=1
12-15-2017 10:04:05.749 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevel
JustBeforeKVStore"
12-15-2017 10:04:05.750 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevelKVStore"
12-15-2017 10:04:07.070 +0000 ERROR MongodRunner - Did not get EOF from mongod after 1 second(s).
12-15-2017 10:04:07.070 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevel
Thruput"
12-15-2017 10:04:07.070 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevel_TcpInput1"
12-15-2017 10:04:07.070 +0000 INFO TcpInputProc - Running shutdown level 1. Closing listening ports.
12-15-2017 10:04:07.070 +0000 INFO TcpInputProc - Shutting down listening ports
12-15-2017 10:04:07.070 +0000 INFO TcpInputProc - Stopping IPv4 port 9997

0 Karma
Highlighted

Re: Peers doesn't come back live after the restart signal from Master. Why?

Influencer

If you are using unit files for splunk service, then check the value that is set for Restart option.

If the value is on-failure, then it wont be restarted upon clean exit. The service will be restarted when the process exits with a non-zero exit code, is terminated by a signal (including on core dump, but excluding the aforementioned four signals), when an operation (such as service reload) times out, and when the configured watchdog timeout is triggered.

Trying setting it to always and see what happens.

If you are going to use always then ensure that ExecStop=/opt/splunk/bin/splunk stop is also part of your unit file.

View solution in original post