Getting Data In

uneven distribution of forwarder-connections to indexers

Hi,

my setup consists of a dozen indexers and a few hundred forwarders.

If I look at the distributions of indexers the forwarders are connected to, I can see most indexers having about the same amount connections (+/-30%).

Except for our 2nd Indexer which holds about twice as many connections as the others.

The (light)-forwarders use auto-loadbalancing to distribute their data.

Versions in use: 5.02 Light Forwarders / 5.05 Indexer

What could be the problem?
Thanks in advance

EDIT:
Data is indexed only out of log-files. Looking at the switches of indexers there is a steady change every 30 seconds. The 2nd server just gets chosen more often.

EDIT2:
We found a problem with streams not having an EOF, but the problem persists and even got clearer (because we sorted other misbehaviours out)

Tags (2)
0 Karma
1 Solution

Professional service now most likely found a solution. We have not tested yet, but it looks promising:

There is a known bug (I cannot find the SPL right now) that matches our problem.
An upgrade to at least 5.0.5 should fix the uneven distribution of data.

View solution in original post

0 Karma

Professional service now most likely found a solution. We have not tested yet, but it looks promising:

There is a known bug (I cannot find the SPL right now) that matches our problem.
An upgrade to at least 5.0.5 should fix the uneven distribution of data.

View solution in original post

0 Karma

Path Finder

We have a similar behavior in our environment with 5.0.3 light forwarders. Can you please post the SPL?

0 Karma

Due to an io-bottleneck data is very unevenly distributed over the indexers. The indexer 2 performs best and holds 7 times as much data as the worst indexer. This is another issue I am currently working on. The splunk-forwarders do not blacklist the other indexers for not responding, which I'd guess would be a symptom if that was the issue.
Connections from the SH are not counted.

0 Karma

Motivator

Hello

Maybe this is happening to your second indexer:

Important: Universal forwarders are not able to switch indexers when monitoring TCP network streams of data (including Syslog) unless an EOF is reached or an indexer goes down, at which point the forwarder will switch to the next indexer in the list. Because the universal forwarder does not parse the data and identify event boundaries before forwarding the data to the indexer (unlike a heavy forwarder), it has no way of knowing when it's safe to switch to the next indexer unless it receives an EOF.

From docs: http://docs.splunk.com/Documentation/Splunk/6.0/Forwarding/Setuploadbalancingd

Check the type of data the 2nd indexer is receiving to see if this is your case

Regards

Turns out that this was a problem with some of our hosts reading out of a named pipe.
This does not influence the number of connects. Splunk still connects to different servers (still sending data to the first indexer they were connected to)

0 Karma

I already thought of that, but we are only indexing log-files from disk. And even if, this lag in switching should be seen on all indexers!?. I updated orig. post.

0 Karma

Champion

But do you have the same data distribution across all the indexers? Connections doesn't really matter from my perspective as they will always be connected due to heartbeat check.Other thing is how the search peer is configured from search head? It may also be possible the data resides under 2nd indexer so it keeps accessing it. Let have some view point from others as well.