Deployment Architecture

Forwarding events to an indexer cluster using an AWS ELB

kdoonan
Explorer

Hey all,

We have many different forwarders installed on different hosts and we are migrating to a cluster of indexers. I've read about load balancing on the forwarders and from what I gather, if indexer acknowledgement is not turned on then there's a chance that the forwarder could try to forward to an inactive node?

To get around this, I was thinking about using an AWS ELB with a short session stickiness (say 30-60 seconds) and a tcp health check on the indexer port (so that if an indexer node goes down, it stops receiving traffic from the ELB).

So I have a few questions:

  1. Are my assumptions about forwarder load balancing with indexer acknowledgement turned off correct?
  2. Has anyone tried something similar with an ELB and were there any issues with it?
  3. Is there a good reason why I shouldn't enable indexer acknowledgment on all my forwarders? I've read that it has some performance impact, but how much impact?

Thanks in advance.

Labels (2)
1 Solution

dwaddle
SplunkTrust
SplunkTrust

In short, "don't do this". Let's start with the thought that it's basically an unsupported configuration. The forwarders don't require the help of an external load balancer to connect them to indexers - they are able to round-robin across indexers just fine on their own.

The built-in forwarder load balancing does just fine without indexer acknowledgement enabled. When an indexer is "hard" down, it is round-robin'ed over without any real trouble.

With indexer clustering, there really isn't a concept of an "inactive" indexer, except as a temporary state where it is being restarted as part of a rolling restart. When an indexer going into rolling restart, it closes the TCP input sockets. Now, at a TCP level, forwarders get a TCP RST / Connection Refused when they try to round-robin across that restarting indexer ... so they move on to the next one.

Indexer acknowledgement is useful because it gives you an application-level "I got it" from the indexer. Normally, with TCP connections, there can be some amount of data that is in the kernel's buffer which has been acknowledged by TCP but has not been read by the application. Without indexer acknowledgement, if Splunk crashes in that moment you lose what was in the TCP buffers. Enabling indexer acknowledgement gives forwarders a slightly stronger guarantee. If you're going to the effort of data replication on your indexers that slightly stronger guarantee is probably worth it.

In summary, don't use a 3rd party load balancing product (F5, AWS ELB, NetScaler, ACE, etc) between forwarders and indexers. It's not a supported configuration and is wholly un-necessary. Splunk's forwarder round robin does just fine without.

View solution in original post

pwmcity
Path Finder

Hey, any updates on this? We're about to set up something similar and I'm going through the same thought process. Originally I figured having an ELB would mean managing less public IP's making security groups easier to manager. Based on dwaddle's answer below I'm thinking I'll instead expose a couple of intermediate forwarders and create an DNS A record for each, then use the Splunk built-in load balancing. Would love to hear your thoughts

0 Karma

dwaddle
SplunkTrust
SplunkTrust

Yes, intermediate forwarders would be much better than an ELB ... if for no other reason than Splunk support will actually support that configuration 🙂

0 Karma

dwaddle
SplunkTrust
SplunkTrust

In short, "don't do this". Let's start with the thought that it's basically an unsupported configuration. The forwarders don't require the help of an external load balancer to connect them to indexers - they are able to round-robin across indexers just fine on their own.

The built-in forwarder load balancing does just fine without indexer acknowledgement enabled. When an indexer is "hard" down, it is round-robin'ed over without any real trouble.

With indexer clustering, there really isn't a concept of an "inactive" indexer, except as a temporary state where it is being restarted as part of a rolling restart. When an indexer going into rolling restart, it closes the TCP input sockets. Now, at a TCP level, forwarders get a TCP RST / Connection Refused when they try to round-robin across that restarting indexer ... so they move on to the next one.

Indexer acknowledgement is useful because it gives you an application-level "I got it" from the indexer. Normally, with TCP connections, there can be some amount of data that is in the kernel's buffer which has been acknowledged by TCP but has not been read by the application. Without indexer acknowledgement, if Splunk crashes in that moment you lose what was in the TCP buffers. Enabling indexer acknowledgement gives forwarders a slightly stronger guarantee. If you're going to the effort of data replication on your indexers that slightly stronger guarantee is probably worth it.

In summary, don't use a 3rd party load balancing product (F5, AWS ELB, NetScaler, ACE, etc) between forwarders and indexers. It's not a supported configuration and is wholly un-necessary. Splunk's forwarder round robin does just fine without.

kmugglet
Communicator

Possibly flogging a dead horse with this.

We're looking at autoscaling our HF layer behind an ELB

UF -> ELB -> HFs -> IDXs

Has anything changed significantly in the last 5 years.

Splunk doco still says no, just wondering if anyone was using this in the wild?

I'm assuming Splunk Cloud has some version of this setup running?

w531t4
Path Finder

I was debating this with a colleague yesterday. So long as SSL isn’t terminated on the ELB, we couldn’t come up with a justification for why we wouldn’t proceed with caution. We use AutoLBVolume 1MB to get an even distribution. 

kdoonan
Explorer

The stickiness wouldn't be set (since it only works with HTTP requests and not TCP). I guess that would mean the forwarder could potentially send events to a different indexer each time. Is there any issue with that?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...