Getting Data In

Prefential Load Balancing from Universal Forwarders to HF/EPs

jatkb
Engager

We are looking to deploy Edge Processors (EP) in a high availability configuration - with 2 EP systems per site and multiple sites. We need to use Edge Processors (or Heavy Fowarders, I guess?) to ingest and filter/transform the event logs before they leave our environment and go to our MSSP Splunk Cloud.

Ideally, I want the Universal Forwarders (UF) to use the local site EPs. However, in the case that those are unavailable, I would like the UFs to failover to use the EPs at another site.

I do not want to have the UFs use the EPs at another site by default, as this will increase WAN costs, so I can't simply list all the servers in the defaultGroup.

For example:

[tcpout]
defaultGroup=site_one_ingest

[tcpout:site_one_ingest]
disabled=false
server=10.1.0.1:9997,10.1.0.2:9997

[tcpout:site_two_ingest]
disabled=true
server=10.2.0.1:9997,10.2.0.2:9997

Is there any way to configure the UFs to prefer the local Edge Processors (site_one_ingest), but then to failover to the second site (site_two_ingest) if those systems are not available?

Is it also possible for the configuration to support automated failback/recovery?

Labels (1)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

If you define multiple output groups, events are pushed to all of them at the same time (unless you override the routing per input or in transform).

If you have multiple destination hosts in an output group, they are handled in a round robin way. There's no other way using built-in mechanics.

You'd need to either use http output and install and intermediate http rev-proxy with health-checked and prioritized backends or do some form of external "switching" of the destination based either on some dynamic network-level redirects or DNS-based mechanisms. But all those are generally non-splunk solutions and add complexity to your deployment.

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

If you define multiple output groups, events are pushed to all of them at the same time (unless you override the routing per input or in transform).

If you have multiple destination hosts in an output group, they are handled in a round robin way. There's no other way using built-in mechanics.

You'd need to either use http output and install and intermediate http rev-proxy with health-checked and prioritized backends or do some form of external "switching" of the destination based either on some dynamic network-level redirects or DNS-based mechanisms. But all those are generally non-splunk solutions and add complexity to your deployment.

gcusello
SplunkTrust
SplunkTrust

Hi @jatkb ,

usually connectione between Splunk systems are configured in autoloadbalancing so you have load distribution and failover management between the receiverse (both HFs or IDXs):

[tcpout]
defaultGroup=autoloadbalancing

[tcpout:autoloadbalancing]
disabled=false
server=10.1.0.1:9997, 10.1.0.2:9997, 10.2.0.1:9997, 10.2.0.2:9997

Otherwise I don't think that it's possible to have an automatic failover management.

Ciao.

Giuseppe

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Kick the Tires Before You Commit: A Hands-On Tour of the Splunk Observability Cloud ...

Evaluating an enterprise observability platform usually goes like this: fill out a form, get a free trial with ...

Deep insights, no barriers: Splunk Observability Cloud Free Edition

As software delivery cycles continue to accelerate, observability shouldn’t be a luxury — it should be a ...

Monitoring AI Agents with Splunk Observability Cloud

Let’s say I’m running a travel planning AI app in production. A user asks for three concise hotel options in ...