Hi everyone,
I have a stand-alone deployment server setup on a CentOS 7 Linux VM with 8 cores and 8GB RAM on Splunk 6.2.8. This server is currently managing about 150 clients, and in this setup, I cannot imagine scaling beyond 500 clients.
There are a few (about 30 clients) set up with deploymentclient.conf on the default phoneHomeIntervalInSecs , but the rest (after we realized) are set up with an interval of 600 (10 minutes).
The server is behaving strangely though. Every 10 minutes or so (imagine that), the CPU usage spikes considerably, and I get load averages ranging from 12-20. This lasts for about 5 minutes and then everything dies down and becomes normal for a few minutes.
Okay, so things I have tried:
Set ulimit -n = 8192 (confirmed for the splunk user)
Configured a DNS server as per known issues for 6.2.0 (Splunk Web becomes unreachable if an enabled deployment server in the same instance cannot access DNS. (SPL-28471))
Even though I'm running in standalone mode, I've disabled the "Deployment server" role for the Distributed Management Console. It only has "Indexer" and "Search Head" selected. (Do not host a distributed management console, which is essentially a search head, on a deployment server with more than 50 clients.) This was done based on the recommendations on this page.
The only thing that I can think of doing wrong is maybe the DMC setup? Other than that, the splunkd.log basically has a bunch of entries for broken pipes around every 10 minutes (which correlates to the problem) and then it chugs along happily for the next 5 minutes.
Surely I'm missing something silly.
Please help?
... View more