Hello, we have an indexer cluster of two peers with replication and serach factors set to 2.
The latest rolling restart is currently not progressing for four hours, the second peer is in status "Reassigning primaries".
Four hours ago initiated a searchable rolling restart from master server's GUI. The first indexer went down for restart and did not return to operation for10 minutes. When logging in under root and then running "su splunk; /opt/splunk/bin/splunk status" saw the following:
splunkd 26239 was not running.
Stopping splunk helpers...
Repeating "/opt/splunk/bin/splunk status" returned the output:
splunkd is not running.
We then started Splunk application by running "/opt/splunk/bin/splunk status". The server went up and the peer joined the cluster.
Starting from that moment the second peer changed status to "Reassigning primaries" and nothing happens up to this moment.
The cluster is in maintenance mode, no fixup tasks are performed, currently have 6k+ of them pending. Search and replication factors are not met for almost all production indexes, 8 of them being not fully searchable.
How can we finish the rolling restart or at least cancel it?
Thank you for your time and assistance!
The "searchable" rolling restart feature frequently fails leaving indexers stuck in a "Reassigning primaries" state. It's so prone to failure, and the impact of failure is so severe that you're better off never doing it.
If you want to do a searchable restart, then you must have enough indexer to do it. I assume that it needs to keep SF + RF valid on the whole restart time, so minimum number of indexers must be at least max(SF, RF) + 1, maybe even more? Haven't try this again after 1st failed case 😞 When you are stuck with it, the best/easiest case to get rid of situation is restart CM.
Currently at 8.0.7, but I had this same issue many times before across many other versions. Each time I resolved this by restarting the Splunk service on the indexer which has the status "reassigning primaries".