Deployment Architecture

Index cluster data imbalance with high vol data sources

brent_weaver
Builder

We are running in an index cluster with 53 indxers and are findind that our high volume sources cause data imbalance. We have this index cluster behind an AWS ELB. The high data sources seem "sticky" with what index they write to, therefore causing imbalances. I am looking to use index discovery in my next build in hopes this will mitigate some of this behavior and be more intelligent where the data writes.

This data source comes into an HEC tier and then off to the indexer. The index rebalance seems to be working fine but I want to try to avoid this issue as a whole!

Any thoughts are welcome. Thanks in advance!

0 Karma

mescober_splunk
Splunk Employee
Splunk Employee

If AWS ELB sticky sessions is enabled, subsequent http requests will land to the same indexer. HEC responses includes a cookie that's why.

0 Karma

s2_splunk
Splunk Employee
Splunk Employee

It sounds like you have an architectural constraint in your intermediary HEC tier.
How many Heavy Forwarders do you have receiving HEC traffic and forwarding to your indexers?
Ideally, you will need about 100 intermediary forwarding pipelines sending to your 53 indexer peers to prevent these data imbalance opportunities.

Also, do you have forceTimeBasedAutoLB enabled on your HEC forwarders' outputs.conf? Are you using the default autoLBFrequency?
You could enable forceTimeBasedAutoLB and lower your LB frequency. If you have a high-volume data source, the forwarder may not get an opportunity to switch indexers as frequently as you are expecting.

Finally, having an (E)LB between your forwarders and your indexers is not a supported deployment.

0 Karma

harsmarvania57
Ultra Champion

Hi @brent_weaver,

I don't have much more idea bout AWS ELB but splunk recommends not to use any 3rd party load balancer to send data between splunk instances but to send data from HF Tier -> Indexer Cluster use autoLB method which is splunk inbuilt auto load balancing method.

When you are running HEC on HF Tier then traffic flow should something liek this Application -> AWS ELB -> HF Tier -> Indexer Cluster.

0 Karma
Get Updates on the Splunk Community!

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...