We have a Splunk cluster running which consists of search heads, indexers, heavy forwarders and other Splunk instances (e.g. deployment server, cluster master, ...) and many Universal Forwarders
We want to encrypt all inter-Splunk communications (both inside the Splunk-cluster and between Universal Forwarders and Heavy Forwarders) with custom certificates which are signed by a custom root CA. Initially, this should not be a problem since there is plenty of documentation on this subject.
However, we cannot find any documentation for the scenario in which the root CA needs to be renewed. How can this be done without any downtime (or at least a minimum downtime)? All the scenario's we have seen soo far require a big bang approach in which the cluster and Universal Forwarders will not operate properly untill all the servers and clients have the new root CA.
But that would mean that in a setup with 100+ machines (including Universal Forwarders), the logging will be inconsistent for quite some time as it will take some time to replace a root CA on a 100+ machines.
Even when we have a root CA with a lifetime of several years, there will be a point in time where it will expire and need to be replaced. If Splunk> does not have a viable solution for this scenario, the use of SSL encryption for inner-Splunk communication is very unpractical to say the least.
I'm going through something similar in my environment and it just takes some planning for a switchover. It also depends on how you have your certificates configured. If you have a unique certificate for each forwarder than it is certainly much more painful. A common configuration is to use wildcard certs for forwarders and unique for all servers. Create new certs alongside the old ones and update paths in your configs to point to the new certs without restarting Splunk. Then it is a matter of updating the deployment server to push new certs to each forwarder while also rolling indexers and search heads with CA changes simultaneously allowing those configs to take effect. Likely a brief outage would be required but shouldn't be extensive. You can also temporarily disable the SSL settings while the switch is being made. It definitely isn't easy and done wrong can break your entire Splunk environment.