All Apps and Add-ons

Splunk Stream: How to configure the app to spread load across all indexers?

Contributor

Hello,

We are trying to set up Splunk Stream app for Netflow capture but have one confusion regarding data distribution to Indexers.

We are planning to have below as there will be huge amount of data and limitation of Wire data modular for Splunk Stream.

1) Install Independent Stream forwarder on Linux machine.
2) configure HTTP Event collector on indexer to receive the data sent by stream forwarder.
3) configure Flow collector in streamfwd.conf file.

As per documentation best practice for scaling flow ingestion is to use independent stream forwarder and use Nginx or other LB to distribute load among indexer cluster (http://docs.splunk.com/Documentation/StreamApp/7.0.1/DeployStreamApp/ConfigureFlowcollector) .

But as we are configuring HTTP event collector and distributing [httpː//streamfwd] stanza in all the indexers (per doc - http://docs.splunk.com/Documentation/StreamApp/7.0.1/DeployStreamApp/InstallStreamForwarderonindepen...) - will it not distribute load across all indexers? As per my understanding since token stanza will be distributed to all index servers in cluster - where and why Nginx is required.

Thanks
Hemendra

0 Karma
1 Solution

Splunk Employee
Splunk Employee

hello @hemendralodhi,
Load balancer is needed to scale the event ingestion beyond what a single HEC endpoint can process. I don't have the exact perf numbers (yet), but the ball park is that a single instance of independent Stream Forwarder can collect and process/push to indexers up to ~several TB/day of netflow data. A single HEC-enabled indexer won't be able to handle such load, so that's why we recommend a load-balanced architecture.

View solution in original post

Splunk Employee
Splunk Employee

hello @hemendralodhi,
Load balancer is needed to scale the event ingestion beyond what a single HEC endpoint can process. I don't have the exact perf numbers (yet), but the ball park is that a single instance of independent Stream Forwarder can collect and process/push to indexers up to ~several TB/day of netflow data. A single HEC-enabled indexer won't be able to handle such load, so that's why we recommend a load-balanced architecture.

View solution in original post

Contributor

Thanks vshcherbakov for the response.

I am still trying to understand how stream forwarder will send the data to indexer. Since Stream app will be installed in Search Head , to enable HEC on indexer , I believe we have to install app on indexer as well and enable HEC?

So data forwarding is based on [httpː//streamfwd] stanza , to distribute load on all indexer we have to copy same stanza in all indexers and deploy Nginx. streamforwarder --------> Nginx -------> Indexer Clusters. How streamforwarder will talk to Nginx?

Thanks
Hemendra

0 Karma

Splunk Employee
Splunk Employee

Stream forwarder takes its config from the SH and local config files.

You can configure the HEC endpoints with the Stream app's Configuration -> Distributed Forwarder Management -> "Edit Forwarder Group" dialog box and enter the Nginx endpoint into the Endpoint Urls textbox (you'll need to uncheck the HEC autoconfig first)

Stream app only needs to be installed on the SH, but the HEC config file it creates there should be replicated to the indexers so that they all have the same HEC token, etc.

0 Karma

Contributor

Thanks again. It is very helpful. We will configure it accordingly.

0 Karma

Splunk Employee
Splunk Employee

@hemendralodhi - Did the answer provided by vshcherbakov help provide a working solution to your question? If yes, please don't forget to resolve this post by clicking "Accept". If no, please leave a comment with more feedback. Thanks!

0 Karma