Knowledge Management

Windows upgrade from 8.1.1 to 9.0: Why does it fail to start KV store process?

Sh4ne0
Explorer

I see lots of suggestions in the Community for Linux but not Windows. Has anyone resolved this on a production Windows 2016 Server?

Three errors when logging in to Splunk following the upgrade to 9.0

Failed to start the KV Store See mongod.log and splunkd.log for details.

KV Store changed status to failed. KV Store process terminated.

KV Store process terminated abnormally (exit code 1, status exited with code 1) See mongod.log and splunkd.log for details

Labels (1)
Tags (3)

jonxilinx
Path Finder

This seems to be a big problem , it affects migration of UFs in our environment

9.0.0.1 -> 9.0.1 and
8.2.6 -> 9.0.1

the sourcetype=splunk_migration, shows that all UF clients suffer a 15minute outage while they wait for the UF to"upgrade the kvstore" ?

---

Failed to start mongod. Did not get EOF from mongod after 5 second(s). [App Key Value Store migration] Starting migrate-kvstore. Created version file path=C:\Program Files\SplunkUniversalForwarder\var\run\splunk\kvstore_upgrade\versionFile36 [App Key Value Store migration] Collection data is not available. Starting KV Store storage engine upgrade: Phase 1 (dump) of 2: Failed to migrate to storage engine wiredTiger, reason=Failed to receive response from kvstore error=, service not ready after waiting for timeout=914000ms [App Key Value Store migration] Starting migrate-kvstore. [App Key Value Store migration] Migration is not required.

If I get something back from splunk ticket will post here

 

0 Karma

jonxilinx
Path Finder

This was the work around from Splunk support on the Windows  "UF upgrade" KVstore problem

"Hello Jonathan,

Thank you for your time on todays call.

I have just tested performing the Splunk universal forwarder 8.2.6 to Splunk universal forwarder 9.0.1 upgrade on Windows 2019 Server.

I added the following settings to the server.conf

[kvstore]

disabled = true

The upgrade completed with 30 seconds after starting the upgrade process.

Please let me know if you have any further questions prior to concluding this case"

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Yeah, I hit the "kv store upgrade" problem on Linux UF lately while upgrading from 8.2.something to 9.0.1.

And I also disabled the kvstore explicitly to make it start properly. Otherwise it would get stuck on trying to start the kvstore for upgrade to wiredtiger.

jeremyhagand61
Communicator

I had this issue recently. Apparently it is a known issue, but the issue number Splunk Support gave me was either wrong or not public.

On Windows when you upgrade to 9.x the upgrade process creates a PFX file of the Splunkd cert and imports it into the Windows Local Computer Key Store. The account running Splunk needs to have access to this store to read the key

In my case the Splunk service account WAS a local admin and had access to the Local Computer certificate store. The Splunkd certificate was there and Windows was reporting that I had the private key for the certificate. However, this proved not to be true. When I right-clicked on the certificate in question and selected All Tasks > Manage Private Keys, there was a pop-up which said "No keys found for certificate!"

Luckily I had the original certificate and key (with the password) so I just used the following command to recreate the PFX file (run from the bin directory).

.\splunk.exe cmd openssl pkcs12 -export -out splunkd.pfx -inkey ..\etc\auth\mycerts\splunkd.key -in ..\etc\auth\mycerts\splunkd.pem

Note that the pem file used here is NOT the on with the full chain you might have configured in your server.conf file. It is just the certificate file alone.

After  I had the PFX I was able to delete the broken cert and re-import this new PFX file into the Windows Store. Once that is done you restart Splunk and your KV store should be working again.

Note that I had this problem on one server I upgraded and not the other. Then when I replaced the certificate on that server as part of a further security upgrade the problem occurred again.

jeremyhagand61
Communicator

Further to this, I have just done another upgrade on an instance which is using the default certificate. In this case it was enough to delete the PFX which was created under /etc/auth (you can see the create date that it was just created) and delete the imported certificate in the cert store. This time it is called "SplunkServerDefaultCert". Then I restarted Splunk and the PFX was recreated and the certificate imported correctly with the private key.

AFAS
Explorer

Thanks!

0 Karma

agw
Path Finder

Thanks to Jeremy for the details on this as I was able to get this working by following his instructions.  After upgrading to 9.0.1 I got the same error message "InvalidSSLConfiguration: Could not read private key attached....." but only on the 2nd server I upgraded. I happen to already have a PFX file we use for our wildcard certificate.  I deleted the PFX that was created from the upgrade and then deleted the certificate from windows store inside the Local Computer Certificates under Personal -> Certificates.  I manually imported my PFX into the Local Computer Certificates under Personal -> Certificates.   After that I stopped and started the splunkd service.  I checked the mongod log file (C:\Program Files\Splunk\var\log\Splunk) and everything looked fine.  I went back to the 1st server I had already successfully upgraded, and repeated the process.  The reason I repeated it on the first server is because of what Jeremy pointed out when you open the certificate by right clicking it and selecting all tasks-> Manage Private Keys you get the message "no keys found".  So even though the install succeeded, the certificate isn't complete.

NotSure
Explorer

@Sh4ne0  Finally got this issue cleared. Looks like it was an issue with service account permissions. The steps below are what worked for me. 

-I granted the service account used to run Splunkd service full control permissions to the entire Splunk home directory. 

-Granted the service account permissions to the Private Keys for the server certificate in the cert store.

-When I ran the MSI for Version 9 I ran it as the service account for Splunkd service. 

After doing all this the mongod log shows successful authentication using the server certificate and the KVStore starts without error. I'm not 100% sure exactly which one of the steps above corrected the issue but hopefully it works for you as well.

I'm still having an issue on one windows Heavy Forwarder where the KVStore failed to upgrade to the latest version but is still initializing fine. 

nkaru
Explorer

@NotSure Did you re-run the setup under the splunk service account?

Also can you confirm what exact certificate did you give permission to?

I initially ran the install under a domain admin and getting failed to read certificate.

0 Karma

NotSure
Explorer

@nkaru yes I ran the MSI package for Update 9 as the service account for the SplunkD service

0 Karma

nkaru
Explorer

@NotSure I get an error saying this version of Splunk is already installed when i try to rerun v9 setup under splunk service account. Did you do a repair?

0 Karma

NotSure
Explorer

@nkaru no, I'd reverted to a snapshot prior to the V9 Update and then ran the MSI as the service account.

0 Karma

nkaru
Explorer

@NotSure Unfortunately reverting a snapshot is not an option due to the nature of the logging we use splunk in our environment.

0 Karma

nkaru
Explorer

This fixed my issue:

Under splunk\etc\auth
rename it to server.pem to server.pem.old
and restart splunk Service

No more warnings and in https://localhost:8089/services/server/info and kvStoreStatus is showing "ready"

pwrdwn
Engager

I too am having the identical issues on MS Srv 2016, on the phone for two hours and it was a "No Go". 

C:\Program Files\Splunk\bin>splunk migrate-kvstore

^C

C:\Program Files\Splunk\bin>splunk migrate migrate-kvstore

[App Key Value Store migration] Starting migrate-kvstore.

[App Key Value Store migration] Migration is not required.

 

C:\Program Files\Splunk\bin>splunk migrate kvstore-storage-engine --target-engine wiredTiger

Starting KV Store storage engine upgrade:

Phase 1 (dump) of 2:

ERROR: Failed to migrate to storage engine wiredTiger, reason=Failed to receive response from kvstore error=, service not ready after waiting for timeout=608828ms

When I run kvstore-status I get Splunk services need to be running Really, because they are... any who lots of issues and I really need to get this fixed.

Thanks

NotSure
Explorer

@pwrdwn are you running the "splunk migrate kvstore-storage-engine --target-engine wiredTiger"  migration command after installing update 9 or prior to?

I initially couldn't get the command  to work prior to the update so I tried after where I received the same error you posted. I reverted snapshot and then got it to run prior to installation of update 9 (was having a cert path issue in my server.conf file causing the initial failure). 

 

0 Karma

pwrdwn
Engager

After I upgraded from 8.2.4

My Upgrade readiness app shows me good to go on jQuery stuff but I do get errors on TLS/DNS and SSL. Not sure if that has anything to do with my KVStore failure.

I did read something about upgrading with the same admin credentials as the original load but not sure about that either.

So lots of questions but nothing solid yet, unless I reload ver. 9.0 with the same admin creds? Anybody thoughts?

Thanks

 

 

0 Karma

nkaru
Explorer

Same issue on Windows Server 2019 after upgrading from 8.2.6 to 9.0. Confirmed certificates have are not expired and using default certificates.

Failed global initialization: InvalidSSLConfiguration: Could not read private key attached to the selected certificate, ensure it exists and check the private key permissions

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Since mongod reports access problems to the crypto material, I'd check the effective access rights to the key/cert files from the user splunk runs with. Maybe the installer did mess some permissions a bit.

0 Karma

nkaru
Explorer

Service account has full control to the $SPLUNK_HOME/etc/auth. Service account has local admin rights to the splunk server.

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

March Community Office Hours Security Series Uncovered!

Hello Splunk Community! In March, Splunk Community Office Hours spotlighted our fabulous Splunk Threat ...

Stay Connected: Your Guide to April Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars in April. This post ...