I have upgraded Splunk from 7.1.0 to 7.2.1 on RHEL 7.5. I have noticed that Splunkd takes quite a bit longer on startup, I have also started to have the application crash with the below error:
ERROR KVStorageProvider - An error occurred during the last operation ('listCollections', domain: '15', code: '13053'): No suitable servers found: `serverSelectionTimeoutMS` expired : [socket timeout calling ismaster on '127.0.0.1:8191']
I also noticed heavy memory usage as well. Has anyone else seen this or know the cause of it?
Have you looked at your settings for vm.overcommit_memory?
This appears to be related: https://docs.splunk.com/Documentation/Splunk/7.2.1/ReleaseNotes/LinuxmemoryovercommittingandSplunkcr...
And have you disabled Transparent Huge Pages? https://docs.splunk.com/Documentation/Splunk/7.2.1/ReleaseNotes/SplunkandTHP
After your upgrade, did you verify that the OS is still honoring any ulimit settings that you had in place for your previous version of Splunk?
Also, you note that you are running RHEL 7.5. Depending on your kernel version, this is where the Spectre/Meltdown mitigations are introduced. I've seen cases where these mitigations have introduced a 20-40% performance degradation.