We have 40k+ clients reporting to one DS due to which we see a performance degradation on Splunk DS. As Per best practice i see we should not have more than 10K clients reporting to one DS. Please suggest a way forward on this.
Do we need to have one more seperate DS and divide half of the clients or any specific solution in this scenario?
Check if the current installation works optimally by changing polling interval where size and rate of new/change applications being installed on forwarders will be key drivers.
Else you can scale your DS and should ensure all DS have same set of configurations. The choices are horizontal vs tiered approach, which will also determine the serverclass.conf attribute like stateOnClient for the apps for intermittent DS and forwarders. Other attributes to ensure for scaled DS are whitelisting forwardes based on apps that will be deployed /not deployed along need to report back with stateOnClient = enabled / noop
For horizontal scaling introducing load balancer is a good approach.