I have about 2658 devices checking into our deployment server (CentOS 6.6, x64, Splunk 6.41)
Over all we sit around 10-20% CPU with plenty of memory free. But the over all UI performance is becoming basically un-usable. I am guessing there are some performance tweaks I need to make. Really havn't seen any guides to this.
Probably worth mentioning 99% of the clients have a 2 hour check-in time. But about 20 servers (other Splunk servers) are set to every 2 minutes.
top - 19:09:45 up 319 days, 1:44, 2 users, load average: 0.72, 0.63, 0.40 Tasks: 235 total, 2 running, 233 sleeping, 0 stopped, 0 zombie Cpu(s): 20.9%us, 7.8%sy, 0.0%ni, 69.0%id, 1.7%wa, 0.0%hi, 0.4%si, 0.0%st Mem: 16333660k total, 15423556k used, 910104k free, 191868k buffers Swap: 8388604k total, 375048k used, 8013556k free, 8733040k cached
I have a deployment server with triple number of deployment clients. Same Symptoms, server is bored (heavily underutilized), GUI is unusable (huge delays after each click).
I observed that the browser (I used on my laptop for administering the deployment-server WEB-GUI) consumes all the RAM on my laptop.
I have never seen a browser process before consuming more than 2 GB of RAM. The effect is independent from the browser used.
Maybe splunk helps us by improving the resource consumption in the browser (which implies a redesign of the GUI).
WHat browser are you using? Using Chrome, and 6.4+, i've been with in engagements with 4000+ clients and the GUI responsive and quite smooth. What particular area(s) are you seeing slowness, or "all" areas? Id say there could be some slowness in the forwarder listing, but this shouldnt cascade across the whole deployment...
I am using Firefox, my collegue uses Chrome. We also could reproduce the effect with IE11.
Today our deploymentserver has 9500+ UFs, Splunk Release 6.5.2 for all servers and most of the UFs.
The slowness is most happening while listing all the forwarders. To suffer from the slowness it is sufficient to have the deploymentserver in a browser tab open while working on a different tab. It slows down the complete browser.
As soon as you close the tab with the deploymentserver the browser returns to normal speed after a few seconds.
Splunk support did not believe it first, but we showed it on a webex session to them ... now they are thinking on it.
Having exactly the same issue - I assume this is still occuring for you?
DS with around 20k UF's (issue was occuring when we had 10k).
Chrome, Firefox or IE all experience the issue.
As soon as the browser/tab is closed, browser is responsive again.
The DS is over specced and is barley hitting 20% resource usage during peaks.
Logged a job support.
Have been progessing this issue with Support and received a glimmer of hope today:
They've located the relevant codes which contributed to the issue and currently
discussing on possible fix since some of the change will impact both front end and backend.
Heres hoping a fix is on the way!
Just received confirmation from Splunk support that a partial fix for the Deployment Server UI performance will be made available in 7.1.2 or 7.0.5 (next release). From the testing performed this should decrease the loading time by about 50% on very large instances.
For an environment I work in however we are sitting on about a 6 minute load time... so if this drops to 3 minutes it's a massive improvement.. but 3 minutes is still unacceptable in my books.
I've followed this up to have an Enhanced Request (ER) logged as suggested by Support, as further fixes will apparently require a more indepth code review/change.
So hopefully in the near future things may be "better". I'll report our findings once it gets released.
We are seeing same issue running Splunk 7.0.1 Baremetal server with "15 CPU and 15 GB RAM"
CPU Avg utilization is 1% Memory is 13GB used.
it is a dedicated server for Deployment only with 6000 clients dialing home every 1 hour.
Deployment server is very responsive right after restart, but after about two hours GUI becomes painfully slow in forwarder management section.
During this slowness CPU is only spiking to 20% and no change in memory utilization.