We recently upgraded our cluster from splunk 8.1.0.1 to splunk 9.0.2 and the KVstore on SH cluster were manually upgraded to WiredTiger.
We could see that cluster manager and peer nodes were automatically upgraded to WiredTiger mostly, however some indexer peers failed in this. Please find the related error messages from the mongodb.log below.
Its not clear why exactly this happened. Is there a manual way to recover and migrate?
---------------
2023-01-12T08:44:16.890Z I STORAGE [initandlisten] exception in initAndListen: Location28662: Cannot start server. Detected data files in /usr/local/akamai/splunk/var/lib/splunk/kvstore/mongo created by the 'mmapv1' storage engine, but the specified storage engine was 'wiredTiger'., terminating
2023-01-12T08:44:16.890Z I REPL [initandlisten] Stepping down the ReplicationCoordinator for shutdown, waitTime: 10000ms
2023-01-12T08:44:16.890Z I NETWORK [initandlisten] shutdown: going to close listening sockets...
2023-01-12T08:44:16.890Z I NETWORK [initandlisten] Shutting down the global connection pool
2023-01-12T08:44:16.890Z I - [initandlisten] Killing all operations for shutdown
2023-01-12T08:44:16.890Z I NETWORK [initandlisten] Shutting down the ReplicaSetMonitor
2023-01-12T08:44:16.890Z I CONTROL [initandlisten] Shutting down free monitoring
2023-01-12T08:44:16.890Z I FTDC [initandlisten] Shutting down full-time data capture
2023-01-12T08:44:16.890Z I STORAGE [initandlisten] Shutting down the HealthLog
2023-01-12T08:44:16.890Z I - [initandlisten] Dropping the scope cache for shutdown
2023-01-12T08:44:16.890Z I CONTROL [initandlisten] now exiting
2023-01-12T08:44:16.890Z I CONTROL [initandlisten] shutting down with code:100
---------------
Cluster details – splunk multisite
-------------------------
4 SH (site1)
4 SH (site2)
11 IDX (site1)
11 IDX (site2)
Master1(Site1)
Master2(Standby at site2)
please correct me if I am wrong, but I thought per Splunks documentation recommendation they recommend we run the KV Store on Search Heads and not on Peers aka Indexers, I know this does not solve your problem, but maybe it doesn't have to, maybe just run the KV Store on the Search Heads
Hey, as far as I understand - we could choose to disable KVstore on non-SH machines, but there is nothing really stopping us from running it on other machines. They will be single instance KVstores and is enabled by default on all instances.
In my case, it worked fine on most indexer peers and its just 3 out of 20 indexers that has this issue.
"In my case, it worked fine on most indexer peers and its just 3 out of 20 indexers that has this issue."
Ah yes, that sounds like Splunk, ha ha
here's what I would do:
stop Splunk
delete everything from the splunkd.log
start Splunk
open up the newly populated splunkd.log
start at the top and take your time going through it line by line just quickly glancing what it is doing until it gets to the KVStore part, my money is on a certificate issue