Getting Data In

How to Ensure Proper Load Balancing Across Multiple Heavy Forwarders?

mcfabrero_acn
Explorer

Hi,

We’re currently facing a load imbalance issue in our Splunk deployment and would appreciate any advice or best practices.

Current Setup:

Universal Forwarders (UFs) → Heavy Forwarders (HFs) → Cribl

We originally had 8 HFs handling parsing and forwarding.

Recently, we added 6 new HFs (total of 14 HFs) to help distribute the load more evenly and to offload congested older HFs.

All HFs are included in the UFs’ outputs.conf under the same TCP output group.

Issue: 

We’re seeing that some of the original 8 HFs are still showingblocked=true in metrics.log (splunktcpin queue full), while the newly added HFs have little to no traffic.

It looks like the load is not being evenly distributed across the available HFs.

Here's our current outputs.conf deployed in UFs:

[tcpout]
defaultGroup = HF_Group
forwardedindex.2.whitelist = (_audit|_introspection|_internal)

[tcpout:HF_Group]
server = HF1:9997,HF2:9997,...HF14:9997

We have not set autoLBFrequency yet.

Questions:

Do we need to set autoLBFrequency in order to achieve true active load balancing across all 14 HFs, even when none of them are failing?

If we set autoLBFrequency = 30, are there any potential downsides (e.g., performance impact, TCP session churn)?

Are there better or recommended approaches to ensure even distribution of UF traffic to multiple HFs in environments before forwarding to Cribl?

Please note that we are sending a large volume of data, primarily wineventlogs.

Your help is very much appreciated. Thank you

Labels (3)
0 Karma

isoutamo
SplunkTrust
SplunkTrust
You should use asynchronous forwarding to help this situation. Here is one instruction for it https://splunk.my.site.com/customer/s/article/Asynchronous-Forwarding-to-Splunk
There are lot of other articles about it, which you could easily found e.g. by asking those from your favorite search engine.

PickleRick
SplunkTrust
SplunkTrust

Yup. The so-called asynchronous forwarding or asynchronous load balancing helps greatly in reducing imbalance in data distribution. Without it, when just using time-based LB, a HF sends to one indexer for a specified period of time, then switches to another, then to another. But at any given point in time it only sends to one output. (unless you're using multiple ingestion pipelines in which case you will have multiples of this setup).

And, adding to those pipelines - as you're having a separate HF layer, you might want to try to increase your pipeline count if you have spare resources (mostly CPU) on your HFs. You need to adjust your loadbalancing parameters accordingly.

0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @mcfabrero_acn 

Yes, you could set autoLBFrequency to achieve active load balancing across the output of your UFs to all Heavy Forwarders.

[tcpout:HF_Group]
server = HF1:9997,HF2:9997,...HF14:9997
autoLBFrequency = 30

The other option is to use volume based LB configuration - its worth checking out https://help.splunk.com/en/splunk-enterprise/forward-and-process-data/forwarding-and-receiving-data/...to see which would be more appropriate for your usecase.

The potential downsides of autoLBFrequency would be the TCP connection churn: New connections created every 30 seconds, there could be a *slight* performance overhead due to Connection establishment costs however I wouldnt expect this to be too noticable.

Check out https://community.splunk.com/t5/Getting-Data-In/Universal-Forwarder-not-load-balancing-to-indexers/m... which might also help.

The other thing to consider is an increased number of pipelines - but again its worth understanding the implications of this and considering your available processing resource on the UFs/HFs. Are you currently using the default of 1? See https://docs.splunk.com/Documentation/Splunk/latest/Admin/Serverconf#Remote_applications_configurati... for more info.

Finally - what is the datasource into your UFs? Sometimes sources like syslog can make it tricky to LB effectively.

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

 

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Tech Talk Recap | Mastering Threat Hunting

Mastering Threat HuntingDive into the world of threat hunting, exploring the key differences between ...

Observability for AI Applications: Troubleshooting Latency

If you’re working with proprietary company data, you’re probably going to have a locally hosted LLM or many ...

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

In the age of AI, every tool promises to make our lives easier. From summarizing content to writing code, ...