I have Splunk 6 Enterprise installed on a system with 2x 10-core 3GHz Xeons, 128GB RAM and a 6x SSD RAID-10. When I run searches I notice that Splunk seems to use no more than 6 CPU cores despite having 40 CPU cores in total. This is particularly troubling when I do heavy CPU-bound operations like rex
and iplocation
. I've noticed this even when searching across a large time range, like 3 months. During the search, I/O is very low (relative to the SSD RAID) with no IOWAIT, and RAM usage is low as well. The only bottleneck I see is that the cores in use are at 100%.
Is there some way to tell Splunk to use more CPU cores? My understanding of the default is that is will already be quite aggressive in multithreading, but perhaps there is some hard-coded upper limit?
Here's the view from top during an iplocation-bound query:
top - 23:38:07 up 7:48, 4 users, load average: 6.36, 6.44, 5.63
Tasks: 425 total, 1 running, 424 sleeping, 0 stopped, 0 zombie
%Cpu(s): 14.1 us, 2.5 sy, 0.0 ni, 83.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 13198851+total, 12997244+used, 2016076 free, 82992 buffers
KiB Swap: 13416960+total, 105568 used, 13406403+free. 12418592+cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4836 root 20 0 1334636 637356 15148 S 556.3 0.5 874:09.35 splunkd
38847 root 20 0 2018372 567512 37936 S 101.0 0.4 25:32.94 splunkd
5620 root 20 0 1904932 152960 4888 S 6.6 0.1 8:03.83 python
Here's the actual query I'm testing:
* | iplocation client_ip | geostats count
A system load of 6.36
might normally be a good indication of a taxing job in progress, but with 40 CPUs, that's only 15.9%
CPU usage for this machine.
Here's a view of htop
showing the cores in use and what appears to be 6 splunkd threads (I suppose that's the problem):
1 [||||||| 30.1%] 11 [|||||||| 31.0%] 21 [ 0.0%] 31 [ 0.0%]
2 [| 0.6%] 12 [ 0.0%] 22 [ 0.0%] 32 [ 0.0%]
3 [||||||||||||||||92.2%] 13 [||||||||||||||||92.9%] 23 [|| 2.6%] 33 [ 0.0%]
4 [ 0.0%] 14 [ 0.0%] 24 [ 0.0%] 34 [|| 1.3%]
5 [||||||||||||||||94.1%] 15 [||||||||||||||||94.1%] 25 [ 0.0%] 35 [ 0.0%]
6 [ 0.0%] 16 [ 0.0%] 26 [ 0.0%] 36 [ 0.0%]
7 [| 1.3%] 17 [||||||||||||||| 64.9%] 27 [ 0.0%] 37 [ 0.0%]
8 [ 0.0%] 18 [ 0.0%] 28 [ 0.0%] 38 [ 0.0%]
9 [ 0.0%] 19 [||||||||||||||| 66.9%] 29 [| 0.6%] 39 [ 0.0%]
10 [ 0.0%] 20 [|||||||||||||||100.0%] 30 [ 0.0%] 40 [ 0.0%]
Mem[||||||||||||||||||||||||||||||||||||||5654/128895MB] Tasks: 43, 84 thr; 8 running
Swp[| 103/131024MB] Load average: 6.30 6.40 5.89
Uptime: 07:55:06
├─ splunkd -p 8089 restart
├─ splunkd -p 8089 restart
└─ [splunkd pid=4836] splunkd -p 8089 restart [process-runner]
├─ [splunkd pid=4836] search --id=1405480372.228 --maxbuckets=0 --ttl=600 --maxout=500000 --maxtime=8640000 --lookups=
│ ├─ [splunkd pid=4836] search --id=1405480372.228 --maxbuckets=0 --ttl=600 --maxout=500000 --maxtime=8640000 --looku
│ ├─ [splunkd pid=4836] search --id=1405480372.228 --maxbuckets=0 --ttl=600 --maxout=500000 --maxtime=8640000 --looku
│ ├─ [splunkd pid=4836] search --id=1405480372.228 --maxbuckets=0 --ttl=600 --maxout=500000 --maxtime=8640000 --looku
│ ├─ [splunkd pid=4836] search --id=1405480372.228 --maxbuckets=0 --ttl=600 --maxout=500000 --maxtime=8640000 --looku
│ ├─ [splunkd pid=4836] search --id=1405480372.228 --maxbuckets=0 --ttl=600 --maxout=500000 --maxtime=8640000 --looku
│ └─ [splunkd pid=4836] search --id=1405480372.228 --maxbuckets=0 --ttl=600 --maxout=500000 --maxtime=8640000 --looku
└─ /opt/splunk/bin/splunkd instrument-resource-usage
Lucas K mentioned that you have headroom in your vertical search capacity and there's a way to take advantage of that. You have enough cores and a downright enviable number of IOPS to run multiple Splunk instances with. You should see near-linear performance gains as you add indexer instances to this host and this will make it even easier to transition to a traditional scale-out architecture later on.
I'd probably set up a search head and two indexers in a distributed search configuration to start; I've personally run multiple IPs on the same host to do this but you can just assign separate ports for management/receiver as well. Maybe even run a separate heavy forwarder instance to forward to the indexers while you're at it. Distributed architecture seems like a core concept of Splunk but I'm not aware of any "rules" against using the tgz installer and building vertically when you have that kind of hardware. Have fun!!
Lucas K mentioned that you have headroom in your vertical search capacity and there's a way to take advantage of that. You have enough cores and a downright enviable number of IOPS to run multiple Splunk instances with. You should see near-linear performance gains as you add indexer instances to this host and this will make it even easier to transition to a traditional scale-out architecture later on.
I'd probably set up a search head and two indexers in a distributed search configuration to start; I've personally run multiple IPs on the same host to do this but you can just assign separate ports for management/receiver as well. Maybe even run a separate heavy forwarder instance to forward to the indexers while you're at it. Distributed architecture seems like a core concept of Splunk but I'm not aware of any "rules" against using the tgz installer and building vertically when you have that kind of hardware. Have fun!!
Could you elaborate on this? Are you suggesting running multiple instances on the same machine? With two indexers, do you have to choose which indexer each forwarder goes to? or do you give the forwarder both?
Yes, I was suggesting running multiple instances on the same machine and giving the forwarder both indexers to connect to. However, times have changed with later versions of Splunk and I think this is now considered an anti-pattern because parallelization support is built in. Consider reviewing the .conf 2016 presentation "Harnessing Performance and Scalability with Parallelization".
Slides and recording: https://conf.splunk.com/sessions/2016-sessions.html
Documentation: https://docs.splunk.com/Documentation/Splunk/6.5.2/Capacity/Parallelization
thank you for pulling me out of that rabbit hole.
Thanks - this is a great suggestion 🙂
There are similar configs in limits.conf, e.g. the maximum number of concurrent search jobs, but they factor in the number of available cores.
Things might be faster if you distributed your search over several search peers... depending on the type of search, amount of data, yada yada.
That is a good point, I suppose 7 out of 20 isn't as bad, it's just that as a developer, I know someone put something like "max_search_threads=6" somewhere, and if I increased that number it would certainly be faster.
While your hyperthreading Xeons may indicate 40 CPUs you only have 20 cores. 7 out of 20 still has room to grow, but it doesn't sound as bad as 7 out of 40 🙂
...continued
This is the better situation to be in. The reverse is much harder to fix. Even load balancing using snmp metrics you can really struggle to get optimal search concurrency across separate search heads (i'm experiencing this issue right now).
The performance issue you are talking about in regards to specific commands like iplocation is that you are (as LGuinn already said) allocated 1 core per search. Unfortunately that is just how it works.
Splunk doesn't work in the way you'd expect it to in term of cpu utilisation.
It won't go and use 40 cpus if available, i wish it would ,rather it allows a specific number of searches to run.
With how your server is provisioned what you have is vertical search capacity. That is you can run more searches (higher search concurrency) and have more logged in people before you run into search contention/queuing.
I believe that Splunk will use one core for each search, one core for each logged-in user, and some number of cores (I am no longer sure of the max) for indexing.
Is this machine a search head or an indexer or both?
Finally, this is why you should follow the Splunk machine sizing recommendations - to get more value for your money. I would have probably purchased several commodity-sized servers instead of one big one...
The system is the search head and the indexer. Currently we are more constrained on physical space and power than on cost, plus our volume is low (under 5GB/day), so I figured a single system with super-fast CPU and I/O would be preferred in my case.
I took the Hardware Capacity Planning Questionnaire and each answer was "no", which indicates that a single machine is the recommended configuration in my case, and I basically took the recommended hardware specs and bumped them up a bit.
I'll keep your advice in mind, but I really don't want a Hadoop-style cluster 😞