Splunk Enterprise

KV Store Failing to start 9.4 .1

romedawg
Engager

I have migrated to 9.4.1.   I initially I had certificate issues, which have been resolved. kv store still fails to start however

Outside the error below (Failed to connect to target host: ip-10-34-2-203:8191) there are 

 

/opt/splunk/bin/splunk show kvstore-status --verbose
WARNING: Server Certificate Hostname Validation is disabled. Please see server.conf/[sslConfig]/cliVerifyServerName for details.
This member:
                   backupRestoreStatus : Ready
                              disabled : 0
           featureCompatibilityVersion : An error occurred during the last operation ('getParameter', domain: '15', code: '13053'): No suitable servers found: `serverSelectionTimeoutMS` expired: [Failed to connect to target host: ip-10-34-2-203:8191]
                                  guid : 4059932D-D941-4186-BE08-6B6426B618CB
                                  port : 8191
                            standalone : 1
                                status : failed
                         storageEngine : wiredTiger

 


mongodb.log

 

2025-03-11T15:46:17.377Z I  CONTROL  [initandlisten] MongoDB starting : pid=2570573 port=8191 dbpath=/opt/splunk/var/lib/splunk/kvstore/mongo 64-bit host=ip-10-34-2-203
2025-03-11T15:46:17.377Z I  CONTROL  [initandlisten] db version v4.2.25
2025-03-11T15:46:17.377Z I  CONTROL  [initandlisten] git version: 41b59c2bfb5121e66f18cc3ef40055a1b5fb6c2e
2025-03-11T15:46:17.377Z I  CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2zk-fips  3 Sep 2024
2025-03-11T15:46:17.377Z I  CONTROL  [initandlisten] allocator: tcmalloc
2025-03-11T15:46:17.377Z I  CONTROL  [initandlisten] modules: enterprise
2025-03-11T15:46:17.377Z I  CONTROL  [initandlisten] build environment:
2025-03-11T15:46:17.377Z I  CONTROL  [initandlisten]     distmod: rhel70
2025-03-11T15:46:17.377Z I  CONTROL  [initandlisten]     distarch: x86_64
2025-03-11T15:46:17.377Z I  CONTROL  [initandlisten]     target_arch: x86_64
2025-03-11T15:46:17.377Z I  CONTROL  [initandlisten] options: { net: { bindIp: "0.0.0.0", port: 8191, tls: { CAFile: "opt/splunk/etc/auth/cacert.pem", allowConnectionsWithoutCertificates: true, allowInvalidHostnames: true, certificateKeyFile: "/opt/splunk/etc/auth/server.pem", certificateKeyFilePassword: "<password>", disabledProtocols: "noTLS1_0,noTLS1_1", mode: "requireTLS", tlsCipherConfig: "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RS..." }, unixDomainSocket: { enabled: false } }, replication: { oplogSizeMB: 200, replSet: "4059932D-D941-4186-BE08-6B6426B618CB" }, security: { clusterAuthMode: "sendX509", javascriptEnabled: false, keyFile: "/opt/splunk/var/lib/splunk/kvstore/mongo/splunk.key" }, setParameter: { enableLocalhostAuthBypass: "0", oplogFetcherSteadyStateMaxFetcherRestarts: "0" }, storage: { dbPath: "/opt/splunk/var/lib/splunk/kvstore/mongo", engine: "wiredTiger", wiredTiger: { engineConfig: { cacheSizeGB: 2.25 } } }, systemLog: { timeStampFormat: "iso8601-utc" } }
2025-03-11T15:46:19.083Z I  CONTROL  [initandlisten] ** WARNING: This server will not perform X.509 hostname validation
2025-03-11T15:46:19.083Z I  CONTROL  [initandlisten] ** This may allow your server to make or accept connections to
2025-03-11T15:46:19.083Z I  CONTROL  [initandlisten] ** untrusted parties
2025-03-11T15:46:19.102Z I  REPL     [initandlisten] Rollback ID is 1
2025-03-11T15:46:19.103Z I  REPL     [initandlisten] Did not find local replica set configuration document at startup;  NoMatchingDocument: Did not find replica set configuration document in local.system.replset
2025-03-11T15:46:19.122Z I  CONTROL  [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Replication has not yet been configured
2025-03-11T15:46:19.129Z I  CONTROL  [LogicalSessionCacheReap] Sessions collection is not set up; waiting until next sessions reap interval: config.system.sessions does not exist
2025-03-11T15:46:19.135Z I  NETWORK  [listener] Listening on 0.0.0.0
2025-03-11T15:46:19.135Z I  NETWORK  [listener] waiting for connections on port 8191 ssl
2025-03-11T15:46:19.298Z I  NETWORK  [listener] connection accepted from 10.34.2.203:56880 #1 (1 connection now open)
2025-03-11T15:46:19.300Z I  NETWORK  [conn1] end connection 10.34.2.203:56880 (0 connections now open)

 

 

server.conf

 

[general]
pass4SymmKey =
serverName = splunk1

[sslConfig]
serverCert = /opt/splunk/etc/auth/server.pem
sslRootCAPath = opt/splunk/etc/auth/cacert.pem
enableSplunkdSSL = true
sslVersions = tls1.2
sslPassword = <yada yada yada>
 
[kvstore]
storageEngine = wiredTiger
serverCert = /opt/splunk/etc/auth/server.pem
sslRootCAPath = opt/splunk/etc/auth/cacert.pem
sslVerifyServerCert = true
sslVerifyServerName = true
sslPassword = <yada yada yada>

 

 

serverd.log

When i grep for hostname

 

root@ip-10-34-2-203:~# grep ip-10-34-2-203 /opt/splunk/var/log/splunk/splunkd.log
03-11-2025 12:48:48.418 +0000 INFO  ServerConfig [0 MainThread] - My hostname is "ip-10-34-2-203".
03-11-2025 12:48:48.466 +0000 INFO  loader [2492128 MainThread] - System info: Linux, ip-10-34-2-203, 5.15.0-1077-aws, #84~20.04.1-Ubuntu SMP Mon Jan 20 22:14:54 UTC 2025, x86_64.
03-11-2025 12:49:01.958 +0000 INFO  PubSubSvr [2492128 MainThread] - Subscribed: channel=deploymentServer/phoneHome/default connectionId=connection_127.0.0.1_8089_ip-10-34-2-203_direct_ds_default listener=0x7f2a306bfa00
03-11-2025 12:49:01.958 +0000 INFO  PubSubSvr [2492128 MainThread] - Subscribed: channel=deploymentServer/phoneHome/default connectionId=connection_127.0.0.1_8089_ip-10-34-2-203_direct_ds_default listener=0x7f2a306bfa00
03-11-2025 12:49:01.958 +0000 INFO  PubSubSvr [2492128 MainThread] - Subscribed: channel=deploymentServer/phoneHome/default/metrics connectionId=connection_127.0.0.1_8089_ip-10-34-2-203_direct_ds_default listener=0x7f2a306bfa00
03-11-2025 12:49:01.958 +0000 INFO  PubSubSvr [2492128 MainThread] - Subscribed: channel=tenantService/handshake connectionId=connection_127.0.0.1_8089_ip-10-34-2-203_direct_tenantService listener=0x7f2a306bfc00
03-11-2025 13:44:36.801 +0000 ERROR KVStorageProvider [2493368 TcpChannelThread] - An error occurred during the last operation ('collectionStats', domain: '15', code: '13053'): No suitable servers found: `serverSelectionTimeoutMS` expired: [Failed to connect to target host: ip-10-34-2-203:8191]
03-11-2025 13:44:36.801 +0000 ERROR CollectionConfigurationProvider [2493368 TcpChannelThread] - Failed to get collection stats for collection="era_email_notification_switch" with error: No suitable servers found: `serverSelectionTimeoutMS` expired: [Failed to connect to target host: ip-10-34-2-203:8191]
03-11-2025 14:03:17.838 +0000 ERROR KVStorageProvider [2493425 TcpChannelThread] - An error occurred during the last operation ('getParameter', domain: '15', code: '13053'): No suitable servers found: `serverSelectionTimeoutMS` expired: [Failed to connect to target host: ip-10-34-2-203:8191]
03-11-2025 14:03:17.842 +0000 ERROR KVStorageProvider [2493425 TcpChannelThread] - An error occurred during the last operation ('replSetGetStatus', domain: '15', code: '13053'): No suitable servers found (`serverSelectionTryOnce` set): [connection closed calling hello on 'ip-10-34-2-203:8191']

 

 

These are other errors I noticed that might be related

 

03-11-2025 14:37:31.298 +0000 ERROR X509Verify [2538813 ApplicationUpdateThread] - Server  X509 certificate (CN=DigiCert Global G2 TLS RSA SHA256 2020 CA1,O=DigiCert Inc,C=US) failed validation; error=20, reason="unable to get local issuer certificate"
03-11-2025 14:37:31.298 +0000 WARN  SSLCommon [2538813 ApplicationUpdateThread] - Received fatal SSL3 alert. ssl_state='error', alert_description='unknown CA'.
03-11-2025 14:37:31.298 +0000 WARN  HttpClientRequest [2538813 ApplicationUpdateThread] - Returning error HTTP/1.1 502 Error connecting: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed - please check the output of the `openssl verify` command for the certificates involved; note that if certificate verification is enabled (requireClientCert or sslVerifyServerCert set to "true"), the CA certificate and the server certificate should not have the same Common Name.
03-11-2025 14:37:31.298 +0000 ERROR ApplicationUpdater [2538813 ApplicationUpdateThread] - Error checking for update, URL=https://apps.splunk.com/api/apps:resolve/checkforupgrade: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed - please check the output of the `openssl verify` command for the certificates involved; note that if certificate verification is enabled (requireClientCert or sslVerifyServerCert set to "true"), the CA certificate and the server certificate should not have the same Common Name.
 
 
 
03-11-2025 14:38:57.211 +0000 ERROR KVStoreConfigurationProvider [2536490 KVStoreConfigurationThread] - Failed to start mongod on first attempt reason=Failed to receive response from kvstore error=, service not ready after waiting for timeout=301389ms
03-11-2025 14:38:57.211 +0000 ERROR KVStoreConfigurationProvider [2536490 KVStoreConfigurationThread] - Could not start mongo instance. Initialization failed.
03-11-2025 14:38:57.211 +0000 WARN  KVStoreConfigurationProvider [2536490 KVStoreConfigurationThread] - Action scheduled, but event loop is not ready yet
03-11-2025 14:38:57.211 +0000 ERROR KVStoreBulletinBoardManager [2536490 KVStoreConfigurationThread] - KV Store changed status to failed. Failed to start KV Store process. See mongod.log and splunkd.log for details..
03-11-2025 14:38:57.211 +0000 ERROR KVStoreBulletinBoardManager [2536490 KVStoreConfigurationThread] - Failed to start KV Store process. See mongod.log and splunkd.log for details.
03-11-2025 14:38:57.211 +0000 INFO  KVStoreConfigurationProvider [2536490 KVStoreConfigurationThread] - Mongod service shutting down

 

Labels (2)
0 Karma
1 Solution

romedawg
Engager

Thanks for the reponse!

 

I ended up figuring this out, well at least a solution that allowed kvstore/mongodb to start & upgrade to version7(as part of the 9.4.1 upgrade).

I created a CA/self signed cert and updated the [sslconfig] stanza.

cert.pem priavet.key rootca.pem > server.pep

[sslconfig]
serverCert = /opt/splunk/etc/auth/certs/server.pem
sslRootCAPath = /opt/splunk/etc/auth/certs/gohealth-splunk-ca.pem
enableSplunkdSSL = true
sslVerifyServerCert = true
sslVerifyServerName = true
cliVerifyServerName = true
sslVersions = tls1.2
sslPassword = <Redacted>

 

$ /opt/splunk/bin/splunk cmd openssl verify -verbose -x509_strict -CAfile gohealth-splunk-ca.pem server.crt
server.crt: OK
  
$ /opt/splunk/bin/splunk cmd openssl x509 -in server.pem -noout -purpose
Certificate purposes:
SSL client : Yes
SSL client CA : No
SSL server : Yes
SSL server CA : No

 

I will continue testing w/ letsencrypt certs but may have to live w/ this for the time being..  

View solution in original post

0 Karma

yeti
Engager

For anyone upgrading to 9.4.1 and getting this, please read.

I had this exact problem when testing (upgrading from 9.3.1 ro 9.4.1) in a sandbox environment.
After many "failed" attempts I realized this is actually just a normal status and part of the kvstore upgrade process.

What actually happens is that the kvstore upgrades in steps. From 4.x -> 5.x -> 6.x -> 7.x 

During these step, the status of the kvstore is in failed state.

featureCompatibilityVersion : 5.0
...
status : failed
...
serverVersion : 5.0.26
featureCompatibilityVersion : 6.0
...
status : failed
...
serverVersion : 6.0.15
featureCompatibilityVersion : An error occurred during the last operation ('getParameter', domain: '15', code: '13053'): No suitable servers found: `serverSelectionTimeoutMS` expired: [Failed to connect to target host: 127.0.0.1:8191]
...
status : failed

You also cannot rely on the "splunk show standalone-kvupgrade-status" command:

/opt/splunk/bin/splunk show standalone-kvupgrade-status
Unable to read mongo database version. Check KV Store health.

 

Just ignore all of these and allow the sytem to run for some +5 minutes. (do not stop splunk!)

And then finally the upgrade completes and the status goes to ready.

featureCompatibilityVersion : 7.0
...
status : ready
...
serverVersion : 7.0.14

 

VatsalJagani
SplunkTrust
SplunkTrust

@romedawg- I'm not 100% if your configuration is correct or not. But I would suggest you to go through this two articles as it suggest something regarding what you are having, and I personally faced KVstore issue regarding SSL certificate.

https://docs.splunk.com/Documentation/Splunk/latest/ReleaseNotes/MeetSplunk

https://docs.splunk.com/Documentation/Splunk/9.4.1/Admin/MigrateKVstore

 

romedawg
Engager

Thanks for the reponse!

 

I ended up figuring this out, well at least a solution that allowed kvstore/mongodb to start & upgrade to version7(as part of the 9.4.1 upgrade).

I created a CA/self signed cert and updated the [sslconfig] stanza.

cert.pem priavet.key rootca.pem > server.pep

[sslconfig]
serverCert = /opt/splunk/etc/auth/certs/server.pem
sslRootCAPath = /opt/splunk/etc/auth/certs/gohealth-splunk-ca.pem
enableSplunkdSSL = true
sslVerifyServerCert = true
sslVerifyServerName = true
cliVerifyServerName = true
sslVersions = tls1.2
sslPassword = <Redacted>

 

$ /opt/splunk/bin/splunk cmd openssl verify -verbose -x509_strict -CAfile gohealth-splunk-ca.pem server.crt
server.crt: OK
  
$ /opt/splunk/bin/splunk cmd openssl x509 -in server.pem -noout -purpose
Certificate purposes:
SSL client : Yes
SSL client CA : No
SSL server : Yes
SSL server CA : No

 

I will continue testing w/ letsencrypt certs but may have to live w/ this for the time being..  

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@romedawg- Great to hear that you fixed the issue and thank you for writing up your solution on Splunk community.

Kindly accept your answer (your last msg here) by clicking on "Accept as Solution" so future Splunk Community users will also get benefited from your answer.

Get Updates on the Splunk Community!

New This Month - Splunk Observability updates and improvements for faster ...

What’s New? This month, we’re delivering several enhancements across Splunk Observability Cloud for faster and ...

What's New in Splunk Cloud Platform 9.3.2411?

Hey Splunky People! We are excited to share the latest updates in Splunk Cloud Platform 9.3.2411. This release ...

Buttercup Games: Further Dashboarding Techniques (Part 6)

This series of blogs assumes you have already completed the Splunk Enterprise Search Tutorial as it uses the ...