Solved: Distributed Environment: Why one indexer is receiv...

Raghav2384 · ‎09-15-2014

Hello Experts, I would like to thank whoever has been helping me out for the past couple of days. I have setup a little distributed environment, all with Splunk ent trial licenses at home. I have 1-Search Head, 2-Indexers and 3-Forwarders. Indexers set as search peers. Here's my question- I have installed Forwarder on a Ubuntu system and been checking for the event count on the indexers.
Indexer A and Indexer B has pretty much same CPU and system configurations.
I see Indexer A gets most of the stuff from Ubuntu forwarder and Indexer B is receiving very little. Interesting thing is i setup Indexer A after Indexer B. Is there any config rule for forwarder to favor an indexer or is it strictly based on the power of the Indexer holding Laptop?
Thanks in advance 🙂
Raghav

triest · ‎09-16-2014

Since you say "Indexer A gets most of the stuff[...] and Indexer Bis receiving very little", I'm going to assume you see some data from Indexer B from the Ubuntu host. That assumption rules out issues where the search head is only distributing the search to one indexer and issues with the indexer not receiving data.

Since this is a new environment, I'm assuming these are fairly new versions and thus SPL-69922 is not relevant (you can read more in the 5.0.5 release notes.

Universal Forwarders have no concept of an event; they just see a stream of data and its the heavy forwarder or indexer down the road that will do the event breaking. To avoid sending part of an event to one indexer and then sending the rest to another, when a Universal Forwarder sees a log file get updated, it will try to read to the end of the file (and then wait three seconds for more data) and send all of that to the same server. For a very heavy load (e.g. net flow for a big network), this can prevent a Universal Forwarder from properly load blanking. I doubt you have enough throughput to cause that, but if you're indexing old logs, a similar effect can occur. It's probably best just to let it happen in the case of old logs, but if its a matter of the velocity of the logs, you can use autoLBFrequency in outputs.conf to force it to change indexer, but you are increasing the chances of having an event get split.

Indexer selection is random and not a shuffle. Thus if a forwarder is only configured to send data to two servers, there's a 50% chance that it will choose the same server. So for small data sets, you would expect to see some indexer affinity.

Finally (and I doubt this applies in your case), I wrote a query to look for indexer affinity based on forwarder, and I noticed our jobs server was a big offender. That makes sense as it doesn't write data very often, but for our summary indexing tasks, it can write large amounts of data in a short time and thus on a per day or even per week basis show some affinity. Long term though, its a fairly even load.

If you leave things up for a day, how do that days logs look?

Good luck with the problem; we have 1700+ forwarders and went back and forth with support for over 6 months working towards getting a better spread (4/20 had twice as much data as the smallest indexer). In the end, there wasn't a silver bullet; for some (like net flow) setting the auto load balance frequency helped while other it was a matter of getting forwarder version upgraded. The big thing is to keep an eye on warnings and errors in the _internal logs so if the connection is timing out or unavailable you detect that and are able to troubleshoot that as at least part of the cause of your indexer affinity.

View solution in original post

linu1988 · ‎09-16-2014

I believe the autolb functionality does the request balancing than data load balancing. This means we have a request for 30 sec and there will be 5 MB of data the next 30 sec it will switch to the next indexer where we will have 10MB of data. this means there are indexer with single requests but the data varies. Is it not straight forward?

triest · ‎09-16-2014

Since you say "Indexer A gets most of the stuff[...] and Indexer Bis receiving very little", I'm going to assume you see some data from Indexer B from the Ubuntu host. That assumption rules out issues where the search head is only distributing the search to one indexer and issues with the indexer not receiving data.

Since this is a new environment, I'm assuming these are fairly new versions and thus SPL-69922 is not relevant (you can read more in the 5.0.5 release notes.

Universal Forwarders have no concept of an event; they just see a stream of data and its the heavy forwarder or indexer down the road that will do the event breaking. To avoid sending part of an event to one indexer and then sending the rest to another, when a Universal Forwarder sees a log file get updated, it will try to read to the end of the file (and then wait three seconds for more data) and send all of that to the same server. For a very heavy load (e.g. net flow for a big network), this can prevent a Universal Forwarder from properly load blanking. I doubt you have enough throughput to cause that, but if you're indexing old logs, a similar effect can occur. It's probably best just to let it happen in the case of old logs, but if its a matter of the velocity of the logs, you can use autoLBFrequency in outputs.conf to force it to change indexer, but you are increasing the chances of having an event get split.

Indexer selection is random and not a shuffle. Thus if a forwarder is only configured to send data to two servers, there's a 50% chance that it will choose the same server. So for small data sets, you would expect to see some indexer affinity.

Finally (and I doubt this applies in your case), I wrote a query to look for indexer affinity based on forwarder, and I noticed our jobs server was a big offender. That makes sense as it doesn't write data very often, but for our summary indexing tasks, it can write large amounts of data in a short time and thus on a per day or even per week basis show some affinity. Long term though, its a fairly even load.

If you leave things up for a day, how do that days logs look?

Good luck with the problem; we have 1700+ forwarders and went back and forth with support for over 6 months working towards getting a better spread (4/20 had twice as much data as the smallest indexer). In the end, there wasn't a silver bullet; for some (like net flow) setting the auto load balance frequency helped while other it was a matter of getting forwarder version upgraded. The big thing is to keep an eye on warnings and errors in the _internal logs so if the connection is timing out or unavailable you detect that and are able to troubleshoot that as at least part of the cause of your indexer affinity.

Raghav2384 · ‎09-15-2014

Hi martin - i used the basic outputs for my UFs
Example : [tcpout:groupname]
server = indexerA:port,indexerB:port
autoLB = true
i didn't mention any autoLBfrequency.

@srioux - both my indexers are part of search peer and i did set autoLB = true....I will blame indexerB's performance if it doesn't take me anywhere 🙂

MuS · ‎09-16-2014

Take a look here: http://answers.splunk.com/answers/62908/universal-forwarder-not-load-balancing-to-indexers.html

Also notice that older UF's tend to develop an affinity between the forwarder and the indexer to which it connects.

martin_mueller · ‎09-15-2014

I was talking about the frequency of inputs, for example "execute top every 60 seconds and index the output". If you do that with the default autoLBFrequency of 30 seconds then every run will end up on the same indexer.

srioux · ‎09-15-2014

Also, didn't see a description of the inputs, either. I've seen situations where Universal Forwarders aren't smart enough to delineate events due to the lack of processing pipelines, so they get stuck forwarding to a single indexer & never LB properly.

srioux · ‎09-15-2014

Unless indexerB is actually not responding to the forwarder's attempt to connect, I'm not sure its "performance" would be at fault.

You could also look at the internal logs on both the indexer and forwarder, it might give you a better indication of what's going on:
$SPLUNK_HOME/var/log/splunk/splunkd.log
(depending on settings, should also be able to search these through "index=_internal" in the Splunk UI as well)

srioux · ‎09-15-2014

Might be worth checking your outputs.conf configurations as well, and make sure that your indexers are part of the forwarder's pool and that autoLB is indeed set to "true".

More info on outputs.conf in docs:
http://docs.splunk.com/Documentation/Splunk/6.1.3/admin/Outputsconf

Can see what your outputs are using Splunk btool cmd:
$SPLUNK_HOME/bin/splunk cmd btool outputs list

martin_mueller · ‎09-15-2014

Are you possibly indexing things at fixed intervals, such as running unix scripted inputs once a minute?

Distributed Environment: Why one indexer is receiving more data than another from the universal forwarder?

New Year, New Changes for Splunk Certifications

[Puzzles] Solve, Learn, Repeat: Unmerging HTML Tables

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

Join the Conversation

Distributed Environment: Why one indexer is receiving more data than another from the universal forwarder?

New Year, New Changes for Splunk Certifications

[Puzzles] Solve, Learn, Repeat: Unmerging HTML Tables

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...