We have an 8 node SHC and Splunk Version is from 6.2.2.1 to 6.3.3.
(a) The SHC captain generates the following message every 5 seconds:
03-01-2016 09:12:54.909 -0800 ERROR KVStorageProvider - Could not update replica set configuration, error domain 1, err code 12, Error message: Requested PRIMARY node is not available.
03-01-2016 09:12:54.909 -0800 ERROR KVStoreConfigurationProvider - Failed to update replica set configuration
(b) Polling the services/server/info REST end-point, the SHC captain returns:
<s:key name="kvStoreStatus">failed</s:key>
And SHC members return:
<s:key name="kvStoreStatus">starting</s:key
(c) mongod.log for 3 SHC members reports no local replica sets:
2016-02-29T08:24:00.380Z I REPL [initandlisten] Did not find local replica set configuration document at startup;
NoMatchingDocument Did not find replica set configuration document in local.system.replset
mongod.log for the other 5 SHC members reports only 5 hosts. All hosts the kvstore port configured for 8201. However, 3 of the hosts still appear using the default port 8191.
2016-02-29T10:04:04.237Z I REPL [ReplicationExecutor] New replica set config in use: { _id: "splunkrs", version: 265, members: [ { _id: 130, host: "17.142.230.35:8191", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: { all: "all", instance: "262C5C0E-47FB-4DC9-965B-3588670626AD" }, slaveDelay: 0, votes: 1 }, { _id: 126, host: "17.142.230.33:8191", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: { all: "all", instance: "686A134F-C51D-41F6-B6E2-E82E11D32BFA" }, slaveDelay: 0, votes: 1 }, { _id: 131, host: "17.142.229.32:8201", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: { all: "all", instance: "81380A60-45E5-4D93-A660-2740A5DB4638" }, slaveDelay: 0, votes: 1 }, { _id: 132, host: "17.142.229.34:8201", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: { all: "all", instance: "C4E3DAED-9F32-4E0A-A916-15AAFF6F983B" }, slaveDelay: 0, votes: 1 }, { _id: 129, host: "17.142.230.34:8191", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: { all: "all", instance: "DF30C540-3D5B-40D0-9278-8D9B9E332598" }, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatTimeoutSecs: 10, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 } } }
We are not using kvstore- what should be done to get rid of these errors?.
if you does not want / cannot stop all members (which is reasonable) - You can take the following steps :
a) Stop Splunk.
b) Backup $SPLUNK_DB/var/lib/kvstore
c) Clean up KVStore splunk clean kvstore --local
d) Disable KVStore in system.conf, see http://docs.splunk.com/Documentation/Splunk/latest/Admin/Serverconf
e) Start this member.
a) Stop Splunk.
b) Enable kvstore in system.conf
c) Start Splunk.
d) Verify that KVStore will have status ready on each member (for example {{ curl -s -k https://localhost:8089/services/server/info | grep kvStoreStatus}}. And there are no issues in splunkd.log of SHC Captain.