Knowledge Management

kvstore errors does not start

sat94541
Communicator

strong textDuring the "Guided Setup" I receive the following error:

Key value store must be enabled. Please enable it.

After bypassing pre-requisites, I get the following:

Error in 'inputlookup' command: External command based lookup 'windows_event_system' is not available because KV Store initialization has not completed yet. Please try again later

I see that the kvstore service is listening on port 8191/tcp and I see mongodb running:

splunk 23324 23287 0 07:39 ? 00:00:22 mongod --dbpath=/opt/splunk/var/lib/splunk/kvstore/mongo --port=8191 --timeStampFormat=iso8601-utc --smallfiles --oplogSize=1000 --keyFile=/opt/splunk/var/lib/splunk/kvstore/mongo/splunk.key --setParameter=enableLocalhostAuthBypass=0 --replSet=splunkrs --sslAllowInvalidHostnames --sslMode=preferSSL --sslPEMKeyFile=/opt/splunk/etc/auth/server.pem --sslPEMKeyPassword=xxxxxxxx --nounixsocket

The splunkd.log file has the folling entry
10-20-2015 04:25:48.433 -0600 ERROR MongodRunner - Did not get EOF from mongod after 1 second(s).
10-20-2015 07:22:20.029 -0600 ERROR MongodRunner - Did not get EOF from mongod after 1 second(s).

The mongod.log has the following messages

grep -i "100.124.XX.XX" m*.log
2015-10-20T10:26:34.176Z W NETWORK [ReplicationExecutor] Failed to connect to 100.124.31.2:8191 after 5000 milliseconds, giving up.
2015-10-20T10:26:34.176Z W REPL [ReplicationExecutor] Locally stored replica set configuration does not have a valid entry for the current node; waiting for reconfig or remote heartbeat; Got "NodeNotFound No host described in new configuration 1 for replica set splunkrs maps to this node" while validating { _id: "splunkrs", version: 1, members: [ { _id: 0, host: "100.124.XX.XX:8191", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: { all: "all", instance: "6916241A-2612-4754-B6D5-F865473B7BA3" }, slaveDelay: 0, votes: 1 }, { _id: 1, host: "100.124.31.35:8191", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: { all: "all", instance: "785C1A9C-4F55-430C-AAA0-62D51D2E3A67" }, slaveDelay: 0, votes: 1 }, { _id: 2, host: "100.124.31.2:8191", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: { all: "all", instance: "C509F6E7-EBF9-4628-AC62-4961CEE51329" }, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatTimeoutSecs: 10, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 } } }

NOTE : The mangod log refers to IPas XXX.XXX.XX.XX , but The IP for the instance is YYY.YYY.YY.YY

One thing that came to mind just now is that all three search heads were renamed when they were moved behind a VIP and the IP shown above i.e XXX.XXX.XX.XX is old IP

Tags (1)

rbal_splunk
Splunk Employee
Splunk Employee

In Splunk version prior to 6.3 Splunk has open Bug #SPL-105440:::IPchange of for the Search head cluster break the KV store.

Here are the sequences of steps you can try to resolve this issue.

1) If DNS names has not changed:
2) On each SHC member set
$SPLUNK_HOME/etc/system/local/server.conf
[kvstore]
replication_host =

for each member set kvstore/replication_host (see http://docs.splunk.com/Documentation/Splunk/6.2.3/admin/Serverconf) to the hostname of this machine (by default in 6.2.x we are using IP addresses, in 6.3 we are using hostnames by default). Set this one by one and restart each member one by one and each member will need to start again.. So after you set it and restart – make sure that you will wait while SHC will be ready and KVStore will be in ready status.
Here is sequence of steps you will follow.

@@Option 1:
1.1)Shut down all SHC members.
1.2)On one SHC set
$SPLUNK_HOME/etc/system/local/server.conf
[kvstore]
replication_host =
1.3) restart the first SHC member where you just set replication_host.
1.4)Wait few Second(30-60s) and check the status of KV store

curl -k -s https://localhost:8089/services/server/info | grep kvStore

If you see status "starting" - possible that SHC is not bootstrapped or it still requires some time to replicate connection strings between SHC members,
if you see "disabled" - possible that KVStore is disabled on current member or whole SHC.
If you see status "failed" - time to investigate .
If you see status "ready" > all is good
1.5)If the first SHC member has the KV store status of ready , proceed and make the following chance on 2nd SHC member
$SPLUNK_HOME/etc/system/local/server.conf
[kvstore]
replication_host =
1.5) restart the 2nd SHC member where you just set replication_host.
1.6)Wait few Second(30-60s) and check the status of KV store

curl -k -s https://localhost:8089/services/server/info | grep kvStore

repeat the above steps as above for all remaining member.

@@@ Option 2: This should be used only if you fall to resolve issue using Option 1 above.

2.1)Do backups!
2.2) Stop all instances.
2.3) Clean folder $SPLUNK_HOME/var/run/splunk/_raft/ on all members (this removes information about all other members in SHC)
2.4) Clean cluster information in KVStore on all instances (in 6.3 we will have "splunk clean kvstore --cluster" command)

rm -fR $SPLUNK_HOME/var/lib/splunk/kvstore/mongo/local.*

2.5) Check if any not default replication_factor is set (if empty - used default replication_factor = 3) or better just backup all configuration settings under shcluster
2.6) Choose one member,
2.7) Start this member.
2.8)On this member change replication_factor to 1
splunk edit shcluster-config -auth admin:changeme -replication_factor 1
2.9) Restart this member.
2.10) Bootstrap SHC with just this member
./bin/splunk bootstrap shcluster-captain -servers_list "https://THIS_MEMBER_URL:8089" -auth admin:changeme
2.11) Verify that SHC status
splunk show shcluster-status

2.12) Verify KVStore status (should be ready)

curl -k -s https://localhost:8089/services/server/info | grep kvStore

2.13) Now we need to add all other instances to this first one
2.14) Start 2nd splunk splunk
2.15)Change replication factor to 1 (to match another configuration)

splunk edit shcluster-config -auth admin:changeme -replication_factor 1
2.16) Restart member.
2.17) Add this member to bootstrapped SHC
splunk add shcluster-member -current_member_uri https://EXISTING_MEMBER_OF_SHC:8089 -auth admin:changeme
2.18) Verify SHC information
splunk show shcluster-status
2.19) Verify KVStore status (should be ready)

curl -k -s https://localhost:8089/services/server/info | grep kvStore

2.20) At this remove replication_factor from all members and restart and verify if KV store came up on each member.
$SPLUNk_HOME/etc/system/local/server.conf

[shclustering]

replication_factor = 1

Let me know if this helps.

Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.