Hello,
I am giving the Splunk Enterprise 7.1.3 to 7.2.0 upgrade a try in my test environment, and I am currently stuck on the Search Cluster upgrade. I first attempted to do one node at a time, which failed, and then took the entire search cluster offline to do the upgrade. Now, I cannot get the Splunk service to start back up and am getting the following error message.
The Search Deployer had a similar issue with the upgrade, but it was resolved with a simple reboot of the instance. I tired the same thing on the search node, along with killing one leftover mongodb process, neither helped.
I also attempted to run the 'splunk migrate migrate-kvstore' command based on other Splunk Answers posts, which also failed with the same reason.
It seems that the Splunk default certificates are being used. If certificate validation is turned on using the default certificates (not-recommended), this may result in loss of communication in mixed-version Splunk environments after upgrade.
"/opt/splunk/etc/auth/ca.pem": already a renewed Splunk certificate: skipping renewal
"/opt/splunk/etc/auth/cacert.pem": already a renewed Splunk certificate: skipping renewal
Clustering migration already complete, no further changes required.
Generating checksums for datamodel and report acceleration bucket summaries for all indexes.
If you have defined many indexes and summaries, summary checksum generation may take a long time.
Processed 2 out of 22 configured indexes.
Processed 4 out of 22 configured indexes.
Processed 6 out of 22 configured indexes.
Processed 8 out of 22 configured indexes.
Processed 10 out of 22 configured indexes.
Processed 12 out of 22 configured indexes.
Processed 14 out of 22 configured indexes.
Processed 16 out of 22 configured indexes.
Processed 18 out of 22 configured indexes.
Processed 20 out of 22 configured indexes.
Processed 22 out of 22 configured indexes.
Finished generating checksums for datamodel and report acceleration bucket summaries for all indexes.
ERROR: Failed to migrate mongo feature compatibility version:
ERROR while running migrate-kvstore migration.
I looked in the splunkd.log and mongo.log files, but there are no new events that have been created since I shutdown the service prior to starting the 'rpm' upgrade. They both end with the related shutdown event as shown below.
[root@ip-10-2-31-134 ~]# tail -n 10 /opt/splunk/var/log/splunk/splunkd.log
10-11-2018 18:48:34.583 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevel_Queue"
10-11-2018 18:48:34.583 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevel_CallbackRunner"
10-11-2018 18:48:34.583 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevel_HttpClient"
10-11-2018 18:48:34.583 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevel_DmcProxyHttpClient"
10-11-2018 18:48:34.583 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevel_Duo2FAHttpClient"
10-11-2018 18:48:34.583 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevel_ApplicationLicenseChecker"
10-11-2018 18:48:34.583 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevel_S3ConnectionPoolManager"
10-11-2018 18:48:34.583 +0000 INFO ShutdownHandler - shutting down level "ShutdownLevel_TelemetryMetricBuffer"
10-11-2018 18:48:34.583 +0000 INFO ShutdownHandler - Shutdown complete in 36.05 seconds
10-11-2018 18:48:35.581 +0000 INFO loader - All pipelines finished.
[root@ip-10-2-31-134 ~]# tail -n 10 /opt/splunk/var/log/splunk/mongod.log
2018-10-11T18:48:02.886Z I JOURNAL [signalProcessingThread] old journal file /opt/splunk/var/lib/splunk/kvstore/mongo/journal/j._0 will be reused as /opt/splunk/var/lib/splunk/kvstore/mongo/journal/prealloc.0
2018-10-11T18:48:02.887Z I JOURNAL [signalProcessingThread] Terminating durability thread ...
2018-10-11T18:48:02.986Z I JOURNAL [journal writer] Journal writer thread stopped
2018-10-11T18:48:02.986Z I JOURNAL [durability] Durability thread stopped
2018-10-11T18:48:02.986Z I STORAGE [signalProcessingThread] shutdown: closing all files...
2018-10-11T18:48:02.986Z I STORAGE [signalProcessingThread] closeAllFiles() finished
2018-10-11T18:48:02.986Z I STORAGE [signalProcessingThread] shutdown: removing fs lock...
2018-10-11T18:48:02.986Z I CONTROL [signalProcessingThread] now exiting
2018-10-11T18:48:02.986Z I CONTROL [signalProcessingThread] shutting down with code:0
2018-10-11T18:48:02.986Z I CONTROL [initandlisten] shutting down with code:0
Thanks,
Erik
Success...for my issues!!!
I believe I have solved all of the problems I was seeing including the upgrade/migration failures, the kvstore not starting, and the SSL error message…with a removal of the double quotes around the “tls1.2” value for the sslVersions setting in a custom apps we deploy to our instances. I am still working through the upgrades of the other instances in the environment to make sure.
[root@ip-10-2-29-7 tmp]# /opt/splunk/bin/splunk cmd btool server list --debug | grep tls
/opt/splunk/etc/system/default/server.conf sslVersions = tls1.2
/opt/splunk/etc/system/default/server.conf sslVersions = tls1.2
/opt/splunk/etc/apps/aws-poc-test-us-east-1-infrastructure-outputs/local/server.conf sslVersions = tls1.2
/opt/splunk/etc/system/default/server.conf sslVersionsForClient = tls1.2
Let me walk through the steps I did to finally get the Cluster Master successfully upgraded to v7.2.0, which are the same steps I am going to work through on the other Splunk Enterprise instances within our SplunkPOC environment.
Below are the second and third outputs from the “./splunk show kvstore-status” command as outlined above, the first one scrolled off the terminal screen before I could grab it.
[root@ip-10-2-29-7 tmp]# /opt/splunk/bin/splunk show kvstore-status
Your session is invalid. Please login.
Splunk username: admin
Password:
This member:
backupRestoreStatus : Ready
disabled : 0
guid : 386AC707-E7CA-4827-9E6A-2116283D9727
port : 8191
standalone : 1
status : starting
[root@ip-10-2-29-7 tmp]# /opt/splunk/bin/splunk show kvstore-status
Your session is invalid. Please login.
Splunk username: admin
Password:
This member:
backupRestoreStatus : Ready
date : Fri Nov 2 19:42:38 2018
dateSec : 1541187758.074
disabled : 0
guid : 386AC707-E7CA-4827-9E6A-2116283D9727
oplogEndTimestamp : Fri Nov 2 19:42:37 2018
oplogEndTimestampSec : 1541187757
oplogStartTimestamp : Wed Aug 29 22:16:47 2018
oplogStartTimestampSec : 1535581007
port : 8191
replicaSet : 386AC707-E7CA-4827-9E6A-2116283D9727
replicationStatus : KV store captain
standalone : 1
status : ready
KV store members:
127.0.0.1:8191
configVersion : 1
electionDate : Fri Nov 2 19:42:26 2018
electionDateSec : 1541187746
hostAndPort : 127.0.0.1:8191
optimeDate : Fri Nov 2 19:42:37 2018
optimeDateSec : 1541187757
replicationStatus : KV store captain
uptime : 13
I am working through upgrading the rest of our SplunkPOC environment, upgrading the Search Cluster first, then the Indexing Cluster, followed by the other parts, using the process outlined above. I am going to start with just removing the double quotes around the “tls1.2” value for the sslVersions setting and seeing if the upgrade/migration completes without any issues. If the upgrade/migration still fails, I will complete entire process as outlined above.
Can you indicate what specific changes you made to which specific files to resolve the problem?