Solved: Splunk Instances Utilizing HIGH CPU Usage in VMwar...

sokngoc · ‎09-21-2021

Hi Everyone,

Any help would be appreciated. We have 4 Splunk instances that work together in tandem.

All four servers are Virtual Machines running Red Hat Enterprise Linux 8 Splunk Enterprise 8.2.2. VCenter is 6.7 with 4 ESXI Host each running 6.7 as well.

The Four Splunk VMs are running very high CPU capacity at all times:

45.8 GHz

83.44 GHz

45.6 GHz

83.82 GHz

It is basically running our ESXi Hosts to full capacity. I logged onto each server and ran the top -i command and each server states very low CPU usage.

Does anyone have any recommendations? Any help would be greatly appreciated.

Thank you,

isoutamo · ‎09-22-2021

Hi

when you are running Splunk on VMWare VM's I have seen that in some cases, if you have reserve too much vCores + mem for individual VMs, this could be the end result. You should remember that when VMWare schedules those nodes it needs to clean memory (on EXSi) etc. before it start to run "new" vm. Especially if you have overbooked you ESXi nodes as it's usually recommended you have "shoot your foot" ;-( with splunk nodes.

You should try to decrease number of vCores and memory if possible and then check the situation. Usually I try to us as less resources on VMs as possible and increase those when needed.

r. Ismo

View solution in original post

isoutamo · ‎09-22-2021

Hi

when you are running Splunk on VMWare VM's I have seen that in some cases, if you have reserve too much vCores + mem for individual VMs, this could be the end result. You should remember that when VMWare schedules those nodes it needs to clean memory (on EXSi) etc. before it start to run "new" vm. Especially if you have overbooked you ESXi nodes as it's usually recommended you have "shoot your foot" ;-( with splunk nodes.

You should try to decrease number of vCores and memory if possible and then check the situation. Usually I try to us as less resources on VMs as possible and increase those when needed.

r. Ismo

sokngoc · ‎09-22-2021

Hello,

Interesting solution. I believe it might be worth a try. If it does solve the problem I will let you know.

However, I do see one possible fallacy to your solution. I may run into a situation where I cut the CPU cores in half and the VM continues to maximize the resource available. Then I would have to increate it back to where it was originally.

isoutamo · ‎09-22-2021

Definitely something to try. And be sure that vCores vs. sockets vs. threads are defined reasonably (what ever it means in your environment). Switching from one socket to another or use threads from two or more sockets is more expensive than using those only from one in one vm.

Also remember that cleaning memory when switching content from one vm to another is quite expensive task. For that reason never overbook those resources for splunk nodes.

And last thing, ensure that you have enough IOPS on all nodes (especially in indexers) at the same time. It’s not enough that one peer will get 1200+ and other 200 as all those are needed at same time when search started!

If I recall right there are/were some VMware white papers which go through this more deeper level?

r. Ismo

thetech · ‎09-21-2021

Hi,

Can you please describe your deployment? I.E What roles the servers are. how many indexers, are they clustered. Are you using ES/ISTI.

Regards

theTech

sokngoc · ‎09-22-2021

Hi,

Thanks for responding, only 1 indexer.

I have four servers

1A - Cluster Master

2A - Indexer

3A - Heavy Forwarder

4A - Search Header

Using ES.

Please let me know if you have any recommendations.

Splunk Instances Utilizing HIGH CPU Usage in VMware Environment

configuration

troubleshooting

using Splunk Enterprise

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?