On of the SHC member is going up and down every 5 mins. KV Store is stuck at starting first and then it is stuck at intial sync.
The member is running fine from backend and going to up and down in the search head clustering page. I did try splunk restart on that member server and then the KV are rebuilding and getting stuck at each phase first at starting and now at the inital sync.
Any help/inputs appreciated @rbal_splunk
Thank you
The solution to the above issue was to create ssl cert that fixed it although my internal certs for the server were not expired.
I just recreated the certs on all SHC members and did the kv store clean up on the one which was stuck.
And it got resynched in 20 mins 🙂
Recreating certs with ./splunk createssl
To check expiration:
openssl x509 -enddate -noout -in /opt/splunk/etc/auth/server.pem
cd /opt/splunk/etc/auth && /opt/splunk/bin/splunk createssl server-cert -d . -n server
The solution to the above issue was to create ssl cert that fixed it although my internal certs for the server were not expired.
I just recreated the certs on all SHC members and did the kv store clean up on the one which was stuck.
And it got resynched in 20 mins 🙂
Recreating certs with ./splunk createssl
To check expiration:
openssl x509 -enddate -noout -in /opt/splunk/etc/auth/server.pem
cd /opt/splunk/etc/auth && /opt/splunk/bin/splunk createssl server-cert -d . -n server
If you couldn’t found any reason from Splunk’s internal logs and/or splunk support couldn’t help you, I propose that remove that node from SHC and then reinstall and join it as a new member to this SHC. Of course you should look also OS-level logs too.
r. Ismo