Hi! I'm trying to continue to tune our Splunk Search Heads (We currently have 15!). I'm noticing a few odd behaviors and wanted to see what I could do.
1) Over time, the number of PendingDiscard messages keeps piling up
2) Our Search Head CPU and Memory usage is relatively low to medium
3) Over time, our concurrency counts keep growing (I need to lower my limits, which may also help here)
3) On our search heads, the executor_workers is set to the default of 10.
executor_workers = <positive integer> * Only valid if 'mode=master' or 'mode=slave'. * Number of threads that can be used by the clustering thread pool. * A value of 0 defaults to 1. * Default: 10
Which leads to my question. I've seen a couple of answers posts around setting this on the Search Head, but the Docs for .7.2.10 look to me like it's more designed for the Indexers. But would adding more executor workers allow for more searches to run? What I'm seeing is no more than 15-25 searches run on my 16 Core (32 vCPU) box, even though my limits are much higher than that. And the box isn't even breathing hard.
Can someone offer some advice here?
Splunk uses different attributes to calculate number of concurrent searches that run on a SH. (See This)
Ah. So I'm back to my limits.conf, then?
It's currently this:
dispatch_dir_warning_size = 10000
max_rawsize_perchunk = 0
It's 15 servers, most of them 16x2 (32 vcpu), and they don't even seem to want to service many searches. If I look at the Monitoring Console at any one point in time, a couple of the servers could have 0 searches running, yet I can see some in queue, which is odd. I was wondering more if the Captain is getting overloaded with searches and needed more room? It's also set to run ad_hoc only, so it's not even delegating to itself.
I do have 2 VMs that are in that farm, and they are about 1/2 the power, so I may also see about pulling those out if the captain is basing my load off what the VMs could handle?
Edited to add: Also, I have 39 indexers, and none of those servers are running hot either, and all boxes are in the same data center, so I don't think it's network congestion our IOPS.