I have Splunk 6 Enterprise installed on a system with 2x 10-core 3GHz Xeons, 128GB RAM and a 6x SSD RAID-10. When I run searches I notice that Splunk seems to use no more than 6 CPU cores despite having 40 CPU cores in total. This is particularly troubling when I do heavy CPU-bound operations like rex and iplocation . I've noticed this even when searching across a large time range, like 3 months. During the search, I/O is very low (relative to the SSD RAID) with no IOWAIT, and RAM usage is low as well. The only bottleneck I see is that the cores in use are at 100%.
Is there some way to tell Splunk to use more CPU cores? My understanding of the default is that is will already be quite aggressive in multithreading, but perhaps there is some hard-coded upper limit?
Here's the view from top during an iplocation-bound query:
top - 23:38:07 up 7:48, 4 users, load average: 6.36, 6.44, 5.63
Tasks: 425 total, 1 running, 424 sleeping, 0 stopped, 0 zombie
%Cpu(s): 14.1 us, 2.5 sy, 0.0 ni, 83.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 13198851+total, 12997244+used, 2016076 free, 82992 buffers
KiB Swap: 13416960+total, 105568 used, 13406403+free. 12418592+cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4836 root 20 0 1334636 637356 15148 S 556.3 0.5 874:09.35 splunkd
38847 root 20 0 2018372 567512 37936 S 101.0 0.4 25:32.94 splunkd
5620 root 20 0 1904932 152960 4888 S 6.6 0.1 8:03.83 python
Here's the actual query I'm testing:
* | iplocation client_ip | geostats count
A system load of 6.36 might normally be a good indication of a taxing job in progress, but with 40 CPUs, that's only 15.9% CPU usage for this machine.
Here's a view of htop showing the cores in use and what appears to be 6 splunkd threads (I suppose that's the problem):
1 [||||||| 30.1%] 11 [|||||||| 31.0%] 21 [ 0.0%] 31 [ 0.0%]
2 [| 0.6%] 12 [ 0.0%] 22 [ 0.0%] 32 [ 0.0%]
3 [||||||||||||||||92.2%] 13 [||||||||||||||||92.9%] 23 [|| 2.6%] 33 [ 0.0%]
4 [ 0.0%] 14 [ 0.0%] 24 [ 0.0%] 34 [|| 1.3%]
5 [||||||||||||||||94.1%] 15 [||||||||||||||||94.1%] 25 [ 0.0%] 35 [ 0.0%]
6 [ 0.0%] 16 [ 0.0%] 26 [ 0.0%] 36 [ 0.0%]
7 [| 1.3%] 17 [||||||||||||||| 64.9%] 27 [ 0.0%] 37 [ 0.0%]
8 [ 0.0%] 18 [ 0.0%] 28 [ 0.0%] 38 [ 0.0%]
9 [ 0.0%] 19 [||||||||||||||| 66.9%] 29 [| 0.6%] 39 [ 0.0%]
10 [ 0.0%] 20 [|||||||||||||||100.0%] 30 [ 0.0%] 40 [ 0.0%]
Mem[||||||||||||||||||||||||||||||||||||||5654/128895MB] Tasks: 43, 84 thr; 8 running
Swp[| 103/131024MB] Load average: 6.30 6.40 5.89
Uptime: 07:55:06
├─ splunkd -p 8089 restart
├─ splunkd -p 8089 restart
└─ [splunkd pid=4836] splunkd -p 8089 restart [process-runner]
├─ [splunkd pid=4836] search --id=1405480372.228 --maxbuckets=0 --ttl=600 --maxout=500000 --maxtime=8640000 --lookups=
│ ├─ [splunkd pid=4836] search --id=1405480372.228 --maxbuckets=0 --ttl=600 --maxout=500000 --maxtime=8640000 --looku
│ ├─ [splunkd pid=4836] search --id=1405480372.228 --maxbuckets=0 --ttl=600 --maxout=500000 --maxtime=8640000 --looku
│ ├─ [splunkd pid=4836] search --id=1405480372.228 --maxbuckets=0 --ttl=600 --maxout=500000 --maxtime=8640000 --looku
│ ├─ [splunkd pid=4836] search --id=1405480372.228 --maxbuckets=0 --ttl=600 --maxout=500000 --maxtime=8640000 --looku
│ ├─ [splunkd pid=4836] search --id=1405480372.228 --maxbuckets=0 --ttl=600 --maxout=500000 --maxtime=8640000 --looku
│ └─ [splunkd pid=4836] search --id=1405480372.228 --maxbuckets=0 --ttl=600 --maxout=500000 --maxtime=8640000 --looku
└─ /opt/splunk/bin/splunkd instrument-resource-usage
... View more