Getting Data In

How to Ensure Proper Load Balancing Across Multiple Heavy Forwarders?

mcfabrero_acn
Explorer

Hi,

We’re currently facing a load imbalance issue in our Splunk deployment and would appreciate any advice or best practices.

Current Setup:

Universal Forwarders (UFs) → Heavy Forwarders (HFs) → Cribl

We originally had 8 HFs handling parsing and forwarding.

Recently, we added 6 new HFs (total of 14 HFs) to help distribute the load more evenly and to offload congested older HFs.

All HFs are included in the UFs’ outputs.conf under the same TCP output group.

Issue: 

We’re seeing that some of the original 8 HFs are still showingblocked=true in metrics.log (splunktcpin queue full), while the newly added HFs have little to no traffic.

It looks like the load is not being evenly distributed across the available HFs.

Here's our current outputs.conf deployed in UFs:

[tcpout]
defaultGroup = HF_Group
forwardedindex.2.whitelist = (_audit|_introspection|_internal)

[tcpout:HF_Group]
server = HF1:9997,HF2:9997,...HF14:9997

We have not set autoLBFrequency yet.

Questions:

Do we need to set autoLBFrequency in order to achieve true active load balancing across all 14 HFs, even when none of them are failing?

If we set autoLBFrequency = 30, are there any potential downsides (e.g., performance impact, TCP session churn)?

Are there better or recommended approaches to ensure even distribution of UF traffic to multiple HFs in environments before forwarding to Cribl?

Please note that we are sending a large volume of data, primarily wineventlogs.

Your help is very much appreciated. Thank you

Labels (3)
0 Karma

isoutamo
SplunkTrust
SplunkTrust
You should use asynchronous forwarding to help this situation. Here is one instruction for it https://splunk.my.site.com/customer/s/article/Asynchronous-Forwarding-to-Splunk
There are lot of other articles about it, which you could easily found e.g. by asking those from your favorite search engine.

PickleRick
SplunkTrust
SplunkTrust

Yup. The so-called asynchronous forwarding or asynchronous load balancing helps greatly in reducing imbalance in data distribution. Without it, when just using time-based LB, a HF sends to one indexer for a specified period of time, then switches to another, then to another. But at any given point in time it only sends to one output. (unless you're using multiple ingestion pipelines in which case you will have multiples of this setup).

And, adding to those pipelines - as you're having a separate HF layer, you might want to try to increase your pipeline count if you have spare resources (mostly CPU) on your HFs. You need to adjust your loadbalancing parameters accordingly.

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @mcfabrero_acn 

Yes, you could set autoLBFrequency to achieve active load balancing across the output of your UFs to all Heavy Forwarders.

[tcpout:HF_Group]
server = HF1:9997,HF2:9997,...HF14:9997
autoLBFrequency = 30

The other option is to use volume based LB configuration - its worth checking out https://help.splunk.com/en/splunk-enterprise/forward-and-process-data/forwarding-and-receiving-data/...to see which would be more appropriate for your usecase.

The potential downsides of autoLBFrequency would be the TCP connection churn: New connections created every 30 seconds, there could be a *slight* performance overhead due to Connection establishment costs however I wouldnt expect this to be too noticable.

Check out https://community.splunk.com/t5/Getting-Data-In/Universal-Forwarder-not-load-balancing-to-indexers/m... which might also help.

The other thing to consider is an increased number of pipelines - but again its worth understanding the implications of this and considering your available processing resource on the UFs/HFs. Are you currently using the default of 1? See https://docs.splunk.com/Documentation/Splunk/latest/Admin/Serverconf#Remote_applications_configurati... for more info.

Finally - what is the datasource into your UFs? Sometimes sources like syslog can make it tricky to LB effectively.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

 

0 Karma
Get Updates on the Splunk Community!

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Watch On Demand the Tech Talk, and empower your SOC to reach new heights! Duration: 1 hour  Prepare to ...

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...