Hey all,
We have many different forwarders installed on different hosts and we are migrating to a cluster of indexers. I've read about load balancing on the forwarders and from what I gather, if indexer acknowledgement is not turned on then there's a chance that the forwarder could try to forward to an inactive node?
To get around this, I was thinking about using an AWS ELB with a short session stickiness (say 30-60 seconds) and a tcp health check on the indexer port (so that if an indexer node goes down, it stops receiving traffic from the ELB).
So I have a few questions:
Thanks in advance.
In short, "don't do this". Let's start with the thought that it's basically an unsupported configuration. The forwarders don't require the help of an external load balancer to connect them to indexers - they are able to round-robin across indexers just fine on their own.
The built-in forwarder load balancing does just fine without indexer acknowledgement enabled. When an indexer is "hard" down, it is round-robin'ed over without any real trouble.
With indexer clustering, there really isn't a concept of an "inactive" indexer, except as a temporary state where it is being restarted as part of a rolling restart. When an indexer going into rolling restart, it closes the TCP input sockets. Now, at a TCP level, forwarders get a TCP RST / Connection Refused when they try to round-robin across that restarting indexer ... so they move on to the next one.
Indexer acknowledgement is useful because it gives you an application-level "I got it" from the indexer. Normally, with TCP connections, there can be some amount of data that is in the kernel's buffer which has been acknowledged by TCP but has not been read by the application. Without indexer acknowledgement, if Splunk crashes in that moment you lose what was in the TCP buffers. Enabling indexer acknowledgement gives forwarders a slightly stronger guarantee. If you're going to the effort of data replication on your indexers that slightly stronger guarantee is probably worth it.
In summary, don't use a 3rd party load balancing product (F5, AWS ELB, NetScaler, ACE, etc) between forwarders and indexers. It's not a supported configuration and is wholly un-necessary. Splunk's forwarder round robin does just fine without.
Hey, any updates on this? We're about to set up something similar and I'm going through the same thought process. Originally I figured having an ELB would mean managing less public IP's making security groups easier to manager. Based on dwaddle's answer below I'm thinking I'll instead expose a couple of intermediate forwarders and create an DNS A record for each, then use the Splunk built-in load balancing. Would love to hear your thoughts
Yes, intermediate forwarders would be much better than an ELB ... if for no other reason than Splunk support will actually support that configuration 🙂
In short, "don't do this". Let's start with the thought that it's basically an unsupported configuration. The forwarders don't require the help of an external load balancer to connect them to indexers - they are able to round-robin across indexers just fine on their own.
The built-in forwarder load balancing does just fine without indexer acknowledgement enabled. When an indexer is "hard" down, it is round-robin'ed over without any real trouble.
With indexer clustering, there really isn't a concept of an "inactive" indexer, except as a temporary state where it is being restarted as part of a rolling restart. When an indexer going into rolling restart, it closes the TCP input sockets. Now, at a TCP level, forwarders get a TCP RST / Connection Refused when they try to round-robin across that restarting indexer ... so they move on to the next one.
Indexer acknowledgement is useful because it gives you an application-level "I got it" from the indexer. Normally, with TCP connections, there can be some amount of data that is in the kernel's buffer which has been acknowledged by TCP but has not been read by the application. Without indexer acknowledgement, if Splunk crashes in that moment you lose what was in the TCP buffers. Enabling indexer acknowledgement gives forwarders a slightly stronger guarantee. If you're going to the effort of data replication on your indexers that slightly stronger guarantee is probably worth it.
In summary, don't use a 3rd party load balancing product (F5, AWS ELB, NetScaler, ACE, etc) between forwarders and indexers. It's not a supported configuration and is wholly un-necessary. Splunk's forwarder round robin does just fine without.
Possibly flogging a dead horse with this.
We're looking at autoscaling our HF layer behind an ELB
UF -> ELB -> HFs -> IDXs
Has anything changed significantly in the last 5 years.
Splunk doco still says no, just wondering if anyone was using this in the wild?
I'm assuming Splunk Cloud has some version of this setup running?
Splunk does work with ELB.
Checkout
I was debating this with a colleague yesterday. So long as SSL isn’t terminated on the ELB, we couldn’t come up with a justification for why we wouldn’t proceed with caution. We use AutoLBVolume 1MB to get an even distribution.
The stickiness wouldn't be set (since it only works with HTTP requests and not TCP). I guess that would mean the forwarder could potentially send events to a different indexer each time. Is there any issue with that?