Solved: Prefential Load Balancing from Universal Forwarder...

jatkb · ‎10-10-2024

We are looking to deploy Edge Processors (EP) in a high availability configuration - with 2 EP systems per site and multiple sites. We need to use Edge Processors (or Heavy Fowarders, I guess?) to ingest and filter/transform the event logs before they leave our environment and go to our MSSP Splunk Cloud.

Ideally, I want the Universal Forwarders (UF) to use the local site EPs. However, in the case that those are unavailable, I would like the UFs to failover to use the EPs at another site.

I do not want to have the UFs use the EPs at another site by default, as this will increase WAN costs, so I can't simply list all the servers in the defaultGroup.

For example:

[tcpout]
defaultGroup=site_one_ingest

[tcpout:site_one_ingest]
disabled=false
server=10.1.0.1:9997,10.1.0.2:9997

[tcpout:site_two_ingest]
disabled=true
server=10.2.0.1:9997,10.2.0.2:9997

Is there any way to configure the UFs to prefer the local Edge Processors (site_one_ingest), but then to failover to the second site (site_two_ingest) if those systems are not available?

Is it also possible for the configuration to support automated failback/recovery?

PickleRick · ‎10-11-2024

If you define multiple output groups, events are pushed to all of them at the same time (unless you override the routing per input or in transform).

If you have multiple destination hosts in an output group, they are handled in a round robin way. There's no other way using built-in mechanics.

You'd need to either use http output and install and intermediate http rev-proxy with health-checked and prioritized backends or do some form of external "switching" of the destination based either on some dynamic network-level redirects or DNS-based mechanisms. But all those are generally non-splunk solutions and add complexity to your deployment.

View solution in original post

PickleRick · ‎10-11-2024

If you define multiple output groups, events are pushed to all of them at the same time (unless you override the routing per input or in transform).

If you have multiple destination hosts in an output group, they are handled in a round robin way. There's no other way using built-in mechanics.

You'd need to either use http output and install and intermediate http rev-proxy with health-checked and prioritized backends or do some form of external "switching" of the destination based either on some dynamic network-level redirects or DNS-based mechanisms. But all those are generally non-splunk solutions and add complexity to your deployment.

gcusello · ‎10-10-2024

Hi @jatkb ,

usually connectione between Splunk systems are configured in autoloadbalancing so you have load distribution and failover management between the receiverse (both HFs or IDXs):

[tcpout]
defaultGroup=autoloadbalancing

[tcpout:autoloadbalancing]
disabled=false
server=10.1.0.1:9997, 10.1.0.2:9997, 10.2.0.1:9997, 10.2.0.2:9997

Otherwise I don't think that it's possible to have an automatic failover management.

Ciao.

Giuseppe

Prefential Load Balancing from Universal Forwarders to HF/EPs

universal forwarder

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

Alerting Best Practices: How to Create Good Detectors

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...