Knowledge Management

In upgrading Splunk Enterprise from 7.1.3 to 7.2.0, why is the Mongo Migration failing?

eandresen
Path Finder

Hello,

I am giving the Splunk Enterprise 7.1.3 to 7.2.0 upgrade a try in my test environment, and I am currently stuck on the Search Cluster upgrade. I first attempted to do one node at a time, which failed, and then took the entire search cluster offline to do the upgrade. Now, I cannot get the Splunk service to start back up and am getting the following error message.

The Search Deployer had a similar issue with the upgrade, but it was resolved with a simple reboot of the instance. I tired the same thing on the search node, along with killing one leftover mongodb process, neither helped.

I also attempted to run the 'splunk migrate migrate-kvstore' command based on other Splunk Answers posts, which also failed with the same reason.

It seems that the Splunk default certificates are being used. If certificate validation is turned on using the default certificates (not-recommended), this may result in loss of communication in mixed-version Splunk environments after upgrade. 

"/opt/splunk/etc/auth/ca.pem": already a renewed Splunk certificate: skipping renewal
"/opt/splunk/etc/auth/cacert.pem": already a renewed Splunk certificate: skipping renewal
Clustering migration already complete, no further changes required.

Generating checksums for datamodel and report acceleration bucket summaries for all indexes.
If you have defined many indexes and summaries, summary checksum generation may take a long time.
Processed 2 out of 22 configured indexes.
Processed 4 out of 22 configured indexes.
Processed 6 out of 22 configured indexes.
Processed 8 out of 22 configured indexes.
Processed 10 out of 22 configured indexes.
Processed 12 out of 22 configured indexes.
Processed 14 out of 22 configured indexes.
Processed 16 out of 22 configured indexes.
Processed 18 out of 22 configured indexes.
Processed 20 out of 22 configured indexes.
Processed 22 out of 22 configured indexes.
Finished generating checksums for datamodel and report acceleration bucket summaries for all indexes.
ERROR: Failed to migrate mongo feature compatibility version:
ERROR while running migrate-kvstore migration.

I looked in the splunkd.log and mongo.log files, but there are no new events that have been created since I shutdown the service prior to starting the 'rpm' upgrade. They both end with the related shutdown event as shown below.

[root@ip-10-2-31-134 ~]# tail -n 10 /opt/splunk/var/log/splunk/splunkd.log
10-11-2018 18:48:34.583 +0000 INFO  ShutdownHandler - shutting down level "ShutdownLevel_Queue"
10-11-2018 18:48:34.583 +0000 INFO  ShutdownHandler - shutting down level "ShutdownLevel_CallbackRunner"
10-11-2018 18:48:34.583 +0000 INFO  ShutdownHandler - shutting down level "ShutdownLevel_HttpClient"
10-11-2018 18:48:34.583 +0000 INFO  ShutdownHandler - shutting down level "ShutdownLevel_DmcProxyHttpClient"
10-11-2018 18:48:34.583 +0000 INFO  ShutdownHandler - shutting down level "ShutdownLevel_Duo2FAHttpClient"
10-11-2018 18:48:34.583 +0000 INFO  ShutdownHandler - shutting down level "ShutdownLevel_ApplicationLicenseChecker"
10-11-2018 18:48:34.583 +0000 INFO  ShutdownHandler - shutting down level "ShutdownLevel_S3ConnectionPoolManager"
10-11-2018 18:48:34.583 +0000 INFO  ShutdownHandler - shutting down level "ShutdownLevel_TelemetryMetricBuffer"
10-11-2018 18:48:34.583 +0000 INFO  ShutdownHandler - Shutdown complete in 36.05 seconds
10-11-2018 18:48:35.581 +0000 INFO  loader - All pipelines finished.

[root@ip-10-2-31-134 ~]# tail -n 10 /opt/splunk/var/log/splunk/mongod.log
 2018-10-11T18:48:02.886Z I JOURNAL  [signalProcessingThread] old journal file /opt/splunk/var/lib/splunk/kvstore/mongo/journal/j._0 will be reused as /opt/splunk/var/lib/splunk/kvstore/mongo/journal/prealloc.0
 2018-10-11T18:48:02.887Z I JOURNAL  [signalProcessingThread] Terminating durability thread ...
 2018-10-11T18:48:02.986Z I JOURNAL  [journal writer] Journal writer thread stopped
 2018-10-11T18:48:02.986Z I JOURNAL  [durability] Durability thread stopped
 2018-10-11T18:48:02.986Z I STORAGE  [signalProcessingThread] shutdown: closing all files...
 2018-10-11T18:48:02.986Z I STORAGE  [signalProcessingThread] closeAllFiles() finished
 2018-10-11T18:48:02.986Z I STORAGE  [signalProcessingThread] shutdown: removing fs lock...
 2018-10-11T18:48:02.986Z I CONTROL  [signalProcessingThread] now exiting
 2018-10-11T18:48:02.986Z I CONTROL  [signalProcessingThread] shutting down with code:0
 2018-10-11T18:48:02.986Z I CONTROL  [initandlisten] shutting down with code:0

Thanks,
Erik

1 Solution

eandresen
Path Finder

Success...for my issues!!!

I believe I have solved all of the problems I was seeing including the upgrade/migration failures, the kvstore not starting, and the SSL error message…with a removal of the double quotes around the “tls1.2” value for the sslVersions setting in a custom apps we deploy to our instances. I am still working through the upgrades of the other instances in the environment to make sure.

[root@ip-10-2-29-7 tmp]# /opt/splunk/bin/splunk cmd btool server list --debug | grep tls
/opt/splunk/etc/system/default/server.conf sslVersions = tls1.2
/opt/splunk/etc/system/default/server.conf sslVersions = tls1.2
/opt/splunk/etc/apps/aws-poc-test-us-east-1-infrastructure-outputs/local/server.conf sslVersions = tls1.2
/opt/splunk/etc/system/default/server.conf sslVersionsForClient = tls1.2

Let me walk through the steps I did to finally get the Cluster Master successfully upgraded to v7.2.0, which are the same steps I am going to work through on the other Splunk Enterprise instances within our SplunkPOC environment.

  1. Commented out the highlighted line above
  2. Ran the upgrade/migration and started Splunk, no issue within the migration steps
  3. Confirmed v7.2.0 was running, confirmed the Mongod service was running
  4. Confirmed “./splunk show kvstore-status” was set to “ready”
  5. Uncommented the highlighted line above and restarted Splunk
  6. Confirmed everything started up
  7. Found that “./splunk show kvstore-status” was stuck at “starting”
  8. Removed the double quotes and restarted Splunk
  9. Confirmed everything started up
  10. Found that “./splunk show kvstore-status” was set to “ready”

Below are the second and third outputs from the “./splunk show kvstore-status” command as outlined above, the first one scrolled off the terminal screen before I could grab it.

[root@ip-10-2-29-7 tmp]# /opt/splunk/bin/splunk show kvstore-status
Your session is invalid. Please login.
Splunk username: admin
Password:
This member:
backupRestoreStatus : Ready
disabled : 0
guid : 386AC707-E7CA-4827-9E6A-2116283D9727
port : 8191
standalone : 1
status : starting

[root@ip-10-2-29-7 tmp]# /opt/splunk/bin/splunk show kvstore-status
Your session is invalid. Please login.
Splunk username: admin
Password:
This member:
backupRestoreStatus : Ready
date : Fri Nov 2 19:42:38 2018
dateSec : 1541187758.074
disabled : 0
guid : 386AC707-E7CA-4827-9E6A-2116283D9727
oplogEndTimestamp : Fri Nov 2 19:42:37 2018
oplogEndTimestampSec : 1541187757
oplogStartTimestamp : Wed Aug 29 22:16:47 2018
oplogStartTimestampSec : 1535581007
port : 8191
replicaSet : 386AC707-E7CA-4827-9E6A-2116283D9727
replicationStatus : KV store captain
standalone : 1
status : ready

KV store members:
127.0.0.1:8191
configVersion : 1
electionDate : Fri Nov 2 19:42:26 2018
electionDateSec : 1541187746
hostAndPort : 127.0.0.1:8191
optimeDate : Fri Nov 2 19:42:37 2018
optimeDateSec : 1541187757
replicationStatus : KV store captain
uptime : 13

I am working through upgrading the rest of our SplunkPOC environment, upgrading the Search Cluster first, then the Indexing Cluster, followed by the other parts, using the process outlined above. I am going to start with just removing the double quotes around the “tls1.2” value for the sslVersions setting and seeing if the upgrade/migration completes without any issues. If the upgrade/migration still fails, I will complete entire process as outlined above.

View solution in original post

erichymowitz
Engager

Can you indicate what specific changes you made to which specific files to resolve the problem?

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...