I have multiple indexers in production and I would like to use the rest api to input data using receivers/stream.
The question I have is - should I send the rest api data to a forwarder which would then forward it to all the indexers ? Or Can I only send the data to one indexer this way?
Using a Heavy/Light Weight forwarder in this context would insulate you against problems with an individual indexer, but it would not be a good architecture for properly load balancing across your indexers. The forwarder will only rebalance to a new indexer every 30 seconds by default, meaning if you have only one forwarder, you'll only be intermittently loading data into each indexer. You could create a pool of 1:1 relationship of forwarders to indexers and that would ensure all indexers were being fed data at any given time. At that point, you're not gaining a lot from the forwarder but a smallish buffer (by default) in case the indexer is down.
A better architecture would be to put a proper load balancer (F5, NetScaler, Cascade, nginx/apache in reverse proxy mode) that will offer intelligence load balancing and put that in front of splunkd. The load balancer will have a number of available options, such as least connections, best response times, or simple round robin type load balancing methods of distributing load. It can also be setup with a healthcheck to ensure Splunk is up and responding before sending any new connections to splunkd.
Thanks for the inputs. Your answer answered another one of my question. I was seeing very uneven indexing (in terms of volume of data indexed) across my two indexers even when all my forwarders know about both the indexers. I think that what is happening is that one of my job servers (which is a heavy forwarder) runs scripted inputs and transfers large volumes of data into indexer.
Authentication is a good point, but I'd still recommend a load balancer approach in front of the Indexer, but just have the sessions remain sticky to one particular indexer. If you want more distribution, have your application open multiple connections to the load balancer and you should get good distribution, even with sticky sessions. The forwarder provides very little, if any, value in this scenario. TCP based input will lose you a number of features, including the ability to define index/host/source/sourcetype on a per-POST basis.
The load balancer approach is good for tcp, splunktcp..but adoshi wants to use the receivers/stream REST endpoint to input data.So there has to be a point to point authenticated REST session between the REST client and the REST server(the forwarder in this case).You might be able to slot in a load balancer between the forwarder and the indexers however, as that traffic would be splunktcp.And the forwarder outputs.conf entry just points to the load balancers VIP.So the architecture would be : REST client -> Splunk Forwader -> Load Balancer -> Indexer Cluster
That is an architecture that I have implemented before.
REST client -> Heavy/Light Forwarder -> auto load balanced into Indexer cluster.
Otherwise your REST client would have to implement the auto load balancing logic itself.