Deployment Architecture

What to do when experiencing uneven indexing rate by instance?

justynap_ldz
Path Finder

Hello Splunkers,

from time to time, we observe a bit weird state of our indexer cluster and want to understand its reason.
There are 3 indexers in the cluster (let's say z1el1, z1el2, z1el3) , one of which seems to be overloaded for some time (see a screenshot). 
Internal logs do not show anything wrong or critical.  Indexing rate goes up on one indexer and comes back to the normal state after a few hours. Loadbalancer was checked some time ago by a responsible team and it seems to ok.
Can someone lead us to the right direction on what else to check/do?  


indexers_EW_20220315.JPG

Labels (3)
Tags (2)
0 Karma
1 Solution

isoutamo
SplunkTrust
SplunkTrust

Hi

here is excellent presentation, how to find issues with forwarded data

https://conf.splunk.com/watch/conf-online.html?search=FN1402

There is also some tools presented how to look this on your cluster.

r. Ismo

View solution in original post

0 Karma

justynap_ldz
Path Finder

Thank you all for your support! 
Much appreciated!

0 Karma

PickleRick
SplunkTrust
SplunkTrust

If you have an indexer cluster, you're most probably have either all your UF's pointing to a load-balancing group of all three indexers or you have an intermediate HF layer also pointing to all indexers as a load-balancing group.

With just three indexers each UF or HF might hit any particular indexer with 1/3 probability. So with sufficiently many forwarders the load should be pretty much evenly distributed. But if you have just one or two HFs, you might simply hit one indexer with both HFs and that's that. If I remember correctly, each connection is chosen by random so there is a theoretical possiblity that you hit the same indexer several times (less chance if you have more indexers).

You might try tweaking load-balancing parameters to make the load balancing more "just". See the outputs.conf docs for parameters like autoLBfrequency and autoLBvolume.

isoutamo
SplunkTrust
SplunkTrust

It’s exactly that way. Even we are talking about round robin algorithm when UF select next target, it’s actually random round robin. In theory it could connect always to the same indexer! You should read the conf presentation which link I already post on my earlier post.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Unfortunately my browser sometimes loses session cookies or something like that so I have problems going to conf or blog pages. It doesn't usually bother me much 😉

It's interesting though that random choice is the only available algorithm. But if you always did round-robin... well, that also could be not a great idea since you could - for example - from all forwarders hit the same indexer and then after some period all forwarders would switch to the next indexer. Again - all to the same one. So with a relatively decently sized environment the randomness should even itself out. Mostly.

 

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @justynap_ldz,

I've seen that the problem is only for a few time and in the other hours the load is more or less eually distributed.

I hint to monitor your infrastructure to understand if it's a momentary lap of reason (just to mention Pink Floyd!) or if it's a repeatable problem.

Have you some Heavy Forwarders? 

Because I had an information during a training (not confermed by Professional Services) that HFs send their logs to one Indexer until it's alive, so maybe there could be a situation.

If the problem continue, I hint to open a Case to Splunk Support, you could configure your UFs and HFs to force Indexer change e.g. every 5 minutes.

Ciao.

Giuseppe

 

 

justynap_ldz
Path Finder

Hi,

thank you @gcusello!
Yes, we have 2 Heavy Forwarders. 
We will consider changing the config to force indexer change.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @justynap_ldz ,

good for you, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the Contributors 😉

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

here is excellent presentation, how to find issues with forwarded data

https://conf.splunk.com/watch/conf-online.html?search=FN1402

There is also some tools presented how to look this on your cluster.

r. Ismo

0 Karma

asokoya
Loves-to-Learn

This link does not seem to be working. Can you please update?

0 Karma
Get Updates on the Splunk Community!

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...