Solved: What to do when experiencing uneven indexing rate ...

justynap_ldz · ‎03-17-2022

Hello Splunkers,

from time to time, we observe a bit weird state of our indexer cluster and want to understand its reason.
There are 3 indexers in the cluster (let's say z1el1, z1el2, z1el3) , one of which seems to be overloaded for some time (see a screenshot).
Internal logs do not show anything wrong or critical. Indexing rate goes up on one indexer and comes back to the normal state after a few hours. Loadbalancer was checked some time ago by a responsible team and it seems to ok.
Can someone lead us to the right direction on what else to check/do?

isoutamo · ‎03-17-2022

Hi

here is excellent presentation, how to find issues with forwarded data

https://conf.splunk.com/watch/conf-online.html?search=FN1402

There is also some tools presented how to look this on your cluster.

r. Ismo

View solution in original post

justynap_ldz · ‎03-25-2022

Thank you all for your support!
Much appreciated!

PickleRick · ‎03-17-2022

If you have an indexer cluster, you're most probably have either all your UF's pointing to a load-balancing group of all three indexers or you have an intermediate HF layer also pointing to all indexers as a load-balancing group.

With just three indexers each UF or HF might hit any particular indexer with 1/3 probability. So with sufficiently many forwarders the load should be pretty much evenly distributed. But if you have just one or two HFs, you might simply hit one indexer with both HFs and that's that. If I remember correctly, each connection is chosen by random so there is a theoretical possiblity that you hit the same indexer several times (less chance if you have more indexers).

You might try tweaking load-balancing parameters to make the load balancing more "just". See the outputs.conf docs for parameters like autoLBfrequency and autoLBvolume.

isoutamo · ‎03-17-2022

It’s exactly that way. Even we are talking about round robin algorithm when UF select next target, it’s actually random round robin. In theory it could connect always to the same indexer! You should read the conf presentation which link I already post on my earlier post.

PickleRick · ‎03-17-2022

Unfortunately my browser sometimes loses session cookies or something like that so I have problems going to conf or blog pages. It doesn't usually bother me much 😉

It's interesting though that random choice is the only available algorithm. But if you always did round-robin... well, that also could be not a great idea since you could - for example - from all forwarders hit the same indexer and then after some period all forwarders would switch to the next indexer. Again - all to the same one. So with a relatively decently sized environment the randomness should even itself out. Mostly.

gcusello · ‎03-17-2022

Hi @justynap_ldz,

I've seen that the problem is only for a few time and in the other hours the load is more or less eually distributed.

I hint to monitor your infrastructure to understand if it's a momentary lap of reason (just to mention Pink Floyd!) or if it's a repeatable problem.

Have you some Heavy Forwarders?

Because I had an information during a training (not confermed by Professional Services) that HFs send their logs to one Indexer until it's alive, so maybe there could be a situation.

If the problem continue, I hint to open a Case to Splunk Support, you could configure your UFs and HFs to force Indexer change e.g. every 5 minutes.

Ciao.

Giuseppe

justynap_ldz · ‎03-25-2022

Hi,

thank you @gcusello!
Yes, we have 2 Heavy Forwarders.
We will consider changing the config to force indexer change.

gcusello · ‎03-25-2022

Hi @justynap_ldz ,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the Contributors 😉

isoutamo · ‎03-17-2022

Hi

here is excellent presentation, how to find issues with forwarded data

https://conf.splunk.com/watch/conf-online.html?search=FN1402

There is also some tools presented how to look this on your cluster.

r. Ismo

asokoya · ‎01-26-2023

This link does not seem to be working. Can you please update?

PickleRick · ‎01-27-2023

Try this.

https://conf.splunk.com/files/2019/slides/FN1402.pdf

and this

https://conf.splunk.com/files/2019/summit/FN1402.mp4

What to do when experiencing uneven indexing rate by instance?

indexer

indexer clustering

Linux

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Are you a member of the Splunk Community?