Let me preface this question by stating that we currently do not have any major performance issues at this time.
Our Splunk environment is running on a VM with 8 cores and 16 GB of RAM, using iSCSI NetApp storage. Again, everything is running fine. Except: we sometimes experience "Max number of concurrent searches reached". According to the docs, that maximum number is determined by the number of CPUs on the system, unless you override that in a config file somewhere (not recommended).
My first thought was to throw more CPU at the system. Then I recalled that in a VMware environment you can run into CPU contention- if you have 8 CPUs assigned to the VM, it needs to wait for 8 CPUs to be free in the cluster before it can do what it needs to do. So in a way, as counter-intuitive as it may seem - sometimes fewer CPUs is faster than more CPUs.
We logged into the VMware console and verified that we are, in fact, experiencing significant CPU contention. Our admins have actually recommended reducing the number of assigned CPUs.
I know Splunk has a pretty clear formula for the recommended number of CPUs, but that seems to apply to a physical environment. Is anyone else here running Splunk in a virtual environment? If so, how are you dealing with CPU contention?
My other question: I need to work-around the "maximum number of concurrent searches" issue. If we reduce the number of CPUs, we'll see that warning even more. Is it really a bad idea to override that formula in the config file and manually increase the max concurrent searches allowed?
Thanks!
(And before someone recommends it, moving Splunk to a physical server is not an option for us, nor is using direct-attached storage.)
 
		
		
		
		
		
	
			
		
		
			
					
		Officially, Splunk recommends that if you're going to virtualize, that you should use a many virtual cores/CPUs as for physical, and furthermore to avoid the problem you note (having to wait for 8 CPUs to come free before allowing the VM to run anything) that the CPU resources be dedicated to the VM, i.e., they are always available because they can't be used for anything else. I like say that you shouldn't be looking to save hardware via virtualizing Splunk, only to gain flexibility.
This is because we find that Splunk actually does wind up using and needing the physical CPU resources. Now, it is possible that right now you are at a place where you only need 4 cores for your workload, and in that case you may find that you run into less CPU wait if you reduce the number of cores. But it should be noted that when Splunk uses the CPUS, it will actually use the CPUs and you need to have the underlying physical CPUs available to perform the work. People like to get all crunky about the waiting for free CPUs in VMware, but you may find that you actually need the physical resources to do the search work, so be prepared to bump it back up. (Also, I suspect that you may not be seeing problems because in fact the CPUs are busy enough that it never lets go of its 8 CPUs.)
As for the messages you are seeing about "maximum concurrent searches", first, you should be sure that the messages are not coming from user role restrictions, but are in fact about the global system maximum. if that is the case, and you really aren't seeing problems, you should feel free to increase either the base or multiplier for the number of searches per CPU. The default in 5.x is 2+2xcores, but you can up to to 3x or possibly even 4x cores. It is possible depending on your search profile that you can run more of them, though a different workload may cause searches to backlog.
 
		
		
		
		
		
	
			
		
		
			
					
		Officially, Splunk recommends that if you're going to virtualize, that you should use a many virtual cores/CPUs as for physical, and furthermore to avoid the problem you note (having to wait for 8 CPUs to come free before allowing the VM to run anything) that the CPU resources be dedicated to the VM, i.e., they are always available because they can't be used for anything else. I like say that you shouldn't be looking to save hardware via virtualizing Splunk, only to gain flexibility.
This is because we find that Splunk actually does wind up using and needing the physical CPU resources. Now, it is possible that right now you are at a place where you only need 4 cores for your workload, and in that case you may find that you run into less CPU wait if you reduce the number of cores. But it should be noted that when Splunk uses the CPUS, it will actually use the CPUs and you need to have the underlying physical CPUs available to perform the work. People like to get all crunky about the waiting for free CPUs in VMware, but you may find that you actually need the physical resources to do the search work, so be prepared to bump it back up. (Also, I suspect that you may not be seeing problems because in fact the CPUs are busy enough that it never lets go of its 8 CPUs.)
As for the messages you are seeing about "maximum concurrent searches", first, you should be sure that the messages are not coming from user role restrictions, but are in fact about the global system maximum. if that is the case, and you really aren't seeing problems, you should feel free to increase either the base or multiplier for the number of searches per CPU. The default in 5.x is 2+2xcores, but you can up to to 3x or possibly even 4x cores. It is possible depending on your search profile that you can run more of them, though a different workload may cause searches to backlog.
Thank you, all, for your assistance with this question!
Do you use monitor stanza more for the log monitoring? What all are the parameters being monitored?
I should have been more specific, oops! I'm running Splunk 5.0.2 on Linux.
 
		
		
		
		
		
	
			
		
		
			
					
		What version of Splunk are you running? The answer to your question may depend upon that...
We are running Splunk in VMs but we don't face this issue. It really depends on what task is being done on the system. 
What is the operation system? I would say there are some unnecessary things which runs in the background, get rid of those. Plan for scheduled jobs which are not real time. Index maintenance take a lot of the CPU as data comes to splunk. 
You have 8 CPU, if it supports visualization with support for threads then at a single moment 16 searches can run, So i don't see resources are being consumed this heavily. Use the splunk SOS app where the CPU may be consumed and decide..
