I'm a Splunk PS consultant and have been assisting a client with upgrades and migration to SVA compliant architecture (C1). All well and fully operational on 9.0.2 and the client is happy with this improved and fully compliant deployment.
Following up from the works we reviewed what sensible security hardening could be implemented across the deployment and we agreed that the pass4SymmKey for the clustering stanza could be longer and more complex. We followed the docs and went to each instances' $SPLUNK_HOME/etc/system/local/server.conf and updated the key in plain text. We restarted the Splunkd daemon via Systemd on all instances and checked the infra. All functional and the cluster remains operating properly, ingesting data, clustering operations correct.
However... there is one flaw and that is the MC. That is no longer able to properly query the cluster, it has the DS on it as well and that is properly working and serving apps to clients. It has all the search parameters correct and all nodes listed and was functional immediately before rotation. Yes, I checked btool for the values on disk and decrypted it, all appears fine. After an hour of troubleshooting and checking spkunkd.log there was still no clue but we thought perhaps we had gone too complex on the string with special characters.
Rinse and repeat updating all cluster nodes pass4SymmKey to something less complex without special chars. Still failed to operate properly and we spent another hour very carefully reviewing every stanza in operation and consistency. We then decided to try and setup an MC on another node to compare, same exact issue and all checks just come back as greyed out.
Time pushing on we decided to revert to the original pass4SymmKey and restart daemon, guess what, still not working. We moved onto other pressing matters but I do not want to leave my client without an answer or approach medium term.
Potential for a bug? niche operation rotating pass4SymmKey?
Apologies for the delayed response @isoutamo and resolution, I have only just been back to my client. This is now resolved.
Solution:
This had not been obvious as the LDAP app had been packaged previously but not consistently deployed between the test environment and the production environment. This other posting also proved useful. I don't think the Admin / Clustering / Core Implementation course points this importance out and generally not much is written on the MC. Perhaps we can improve the course material, and definitely good experience gained personally.
The client was satisfied with the reproduction of the issue also and closes out the problem.
Apologies for the delayed response @isoutamo and resolution, I have only just been back to my client. This is now resolved.
Solution:
This had not been obvious as the LDAP app had been packaged previously but not consistently deployed between the test environment and the production environment. This other posting also proved useful. I don't think the Admin / Clustering / Core Implementation course points this importance out and generally not much is written on the MC. Perhaps we can improve the course material, and definitely good experience gained personally.
The client was satisfied with the reproduction of the issue also and closes out the problem.
Hi
your are absolutely sure that you have update clustering stanza on DS too, not general (or others)? Have you try 1st remove cluster manager node (search peer) configuration from DS and then add it again with new key?
r. Ismo
Thanks @isoutamo . I'm very confident on the consistency of the key, I checked it thoroughly and deliberately and performed a ./splunk show-decrypted --value '<string>' on each node in each stanza. I will try the suggestion of deleting and re-adding all search peers on the MC with a reload.