We are looking to deploy Edge Processors (EP) in a high availability configuration - with 2 EP systems per site and multiple sites. We need to use Edge Processors (or Heavy Fowarders, I guess?) to ingest and filter/transform the event logs before they leave our environment and go to our MSSP Splunk Cloud.
Ideally, I want the Universal Forwarders (UF) to use the local site EPs. However, in the case that those are unavailable, I would like the UFs to failover to use the EPs at another site.
I do not want to have the UFs use the EPs at another site by default, as this will increase WAN costs, so I can't simply list all the servers in the defaultGroup.
For example:
[tcpout] defaultGroup=site_one_ingest [tcpout:site_one_ingest] disabled=false server=10.1.0.1:9997,10.1.0.2:9997
[tcpout:site_two_ingest]
disabled=true
server=10.2.0.1:9997,10.2.0.2:9997
Is there any way to configure the UFs to prefer the local Edge Processors (site_one_ingest), but then to failover to the second site (site_two_ingest) if those systems are not available?
Is it also possible for the configuration to support automated failback/recovery?
If you define multiple output groups, events are pushed to all of them at the same time (unless you override the routing per input or in transform).
If you have multiple destination hosts in an output group, they are handled in a round robin way. There's no other way using built-in mechanics.
You'd need to either use http output and install and intermediate http rev-proxy with health-checked and prioritized backends or do some form of external "switching" of the destination based either on some dynamic network-level redirects or DNS-based mechanisms. But all those are generally non-splunk solutions and add complexity to your deployment.
If you define multiple output groups, events are pushed to all of them at the same time (unless you override the routing per input or in transform).
If you have multiple destination hosts in an output group, they are handled in a round robin way. There's no other way using built-in mechanics.
You'd need to either use http output and install and intermediate http rev-proxy with health-checked and prioritized backends or do some form of external "switching" of the destination based either on some dynamic network-level redirects or DNS-based mechanisms. But all those are generally non-splunk solutions and add complexity to your deployment.
Hi @jatkb ,
usually connectione between Splunk systems are configured in autoloadbalancing so you have load distribution and failover management between the receiverse (both HFs or IDXs):
[tcpout]
defaultGroup=autoloadbalancing
[tcpout:autoloadbalancing]
disabled=false
server=10.1.0.1:9997, 10.1.0.2:9997, 10.2.0.1:9997, 10.2.0.2:9997
Otherwise I don't think that it's possible to have an automatic failover management.
Ciao.
Giuseppe