Getting Data In

Issues due to network slowness



We have deployed Job Scheduler, Indexer, Search Head and Forwarder on Virtual Machines. Often we see issues like: 1. Indexer is down. Unable to distribute to peer. 2. Crash logs in indexer. 3. Splunk stops running in Job Scheduler node. 4. Many processes of Splunk helpers running(PIDs increase drastically to 15 and then fluctuate between 5 to 10).

Earlier we did not have all these issues. Recently we see that network is slow and sometimes it is unresponsive for couple of minutes (It takes more than a minute to establish connection to VM).

What kind of issues may arise in splunk system due to slow nature of network. Whether the issues that i have mentioned are due to network speed being slow.


Tags (1)
0 Karma

New Member

There is a Splunk App For Boundary that may be useful...
Boundary monitors all of the network-flows between all of the VMs - its useful for identifying if a hotspots is in the App, VM or Network.

0 Karma


Ok, there are a few pointers to go through here;

Everytime you run a search it will spawn a new Splunkd process, each process will consume a CPU core. From what you've said you need to be sure that you can support up to 15+ instances. It is most likely caused by users running searches combined with scheduled searches you may have.

The old adage that running Splunk on VM's is bad is a little out of date nowadays. If deployed correctly you won't experience too many issues, the key problem is ensuring that the indexer can get about 800 (Really I'd aim for 1200) IOPS on whatever storage it has, giving it native read/write access to a disk can help improve this. VM's are great for Splunk because they make the deployment of Search Heads quite simple, just ensure it has some IO and then load up the CPU's.

Finally, are these all on the same host? How many network ports does the box have? If its a single port and you've got data flowing over it into the indexer, perhaps tcp acks coming back out, users connecting to run searches etc.. its quite possible you're running it near capacity. Have you checked any of the box performance metrics?

0 Karma


Yes, they are all on same ESX server. The ESX box has 2 ports. When i checked performance metrics of the box, for most part of the time it is around 20%. Some times, there is a steep rise and it is at 98.77% memory usage. That time, i see crash logs in indexer.

0 Karma


In general, running Splunk in a virtualized environment is not a good idea.

0 Karma
Get Updates on the Splunk Community!

Observability | How to Think About Instrumentation Overhead (White Paper)

Novice observability practitioners are often overly obsessed with performance. They might approach ...

Cloud Platform | Get Resiliency in the Cloud Event (Register Now!)

IDC Report: Enterprises Gain Higher Efficiency and Resiliency With Migration to Cloud  Today many enterprises ...

The Great Resilience Quest: 10th Leaderboard Update

The tenth leaderboard update (11.23-12.05) for The Great Resilience Quest is out >> As our brave ...