Hi Everyone,
Any help would be appreciated. We have 4 Splunk instances that work together in tandem.
All four servers are Virtual Machines running Red Hat Enterprise Linux 8 Splunk Enterprise 8.2.2. VCenter is 6.7 with 4 ESXI Host each running 6.7 as well.
The Four Splunk VMs are running very high CPU capacity at all times:
45.8 GHz
83.44 GHz
45.6 GHz
83.82 GHz
It is basically running our ESXi Hosts to full capacity. I logged onto each server and ran the top -i command and each server states very low CPU usage.
Does anyone have any recommendations? Any help would be greatly appreciated.
Thank you,
Hi
when you are running Splunk on VMWare VM's I have seen that in some cases, if you have reserve too much vCores + mem for individual VMs, this could be the end result. You should remember that when VMWare schedules those nodes it needs to clean memory (on EXSi) etc. before it start to run "new" vm. Especially if you have overbooked you ESXi nodes as it's usually recommended you have "shoot your foot" ;-( with splunk nodes.
You should try to decrease number of vCores and memory if possible and then check the situation. Usually I try to us as less resources on VMs as possible and increase those when needed.
r. Ismo
Hi
when you are running Splunk on VMWare VM's I have seen that in some cases, if you have reserve too much vCores + mem for individual VMs, this could be the end result. You should remember that when VMWare schedules those nodes it needs to clean memory (on EXSi) etc. before it start to run "new" vm. Especially if you have overbooked you ESXi nodes as it's usually recommended you have "shoot your foot" ;-( with splunk nodes.
You should try to decrease number of vCores and memory if possible and then check the situation. Usually I try to us as less resources on VMs as possible and increase those when needed.
r. Ismo
Hello,
Interesting solution. I believe it might be worth a try. If it does solve the problem I will let you know.
However, I do see one possible fallacy to your solution. I may run into a situation where I cut the CPU cores in half and the VM continues to maximize the resource available. Then I would have to increate it back to where it was originally.
Definitely something to try. And be sure that vCores vs. sockets vs. threads are defined reasonably (what ever it means in your environment). Switching from one socket to another or use threads from two or more sockets is more expensive than using those only from one in one vm.
Also remember that cleaning memory when switching content from one vm to another is quite expensive task. For that reason never overbook those resources for splunk nodes.
And last thing, ensure that you have enough IOPS on all nodes (especially in indexers) at the same time. It’s not enough that one peer will get 1200+ and other 200 as all those are needed at same time when search started!
If I recall right there are/were some VMware white papers which go through this more deeper level?
r. Ismo
Hi,
Can you please describe your deployment? I.E What roles the servers are. how many indexers, are they clustered. Are you using ES/ISTI.
Regards
theTech
Hi,
Thanks for responding, only 1 indexer.
I have four servers
1A - Cluster Master
2A - Indexer
3A - Heavy Forwarder
4A - Search Header
Using ES.
Please let me know if you have any recommendations.