Getting Data In

Why am I experiencing indexer congestion after 6.5.0 upgrade?

johnpof
Path Finder

I have four independent indexers in a round robin, 2 are fairly old, 1 is a year old and my newest is maybe 3-4 months old.

hot/warm is on an SSD Mirror, cold is on spinning disk but currently barely used at all (thresholds not met yet)

Right after upgrading to 6.5.0, my newest indexer started filling up ALL of its indexing queues, I've taken it in/out a bunch of times and although not as often It's still randomly filling up all queues then stopping indexing of data.

All disks are healthy, no IO waits or anything.. I've watched the disks while this was happening and there are no issues with them.

Could these errors have something to do with it?

10-26-2016 13:00:08.697 -0700 WARN  LineBreakingProcessor - Truncating line because limit of 10000 bytes has been exceeded with a line length >= 13431 - data_source="/opt/splunk/var/l
og/splunk/remote_searches.log", data_host="splunk08", data_sourcetype="splunkd_remote_searches"
1 Solution

bport15
Path Finder

We're also seeing this issue since upgrading from 6.4.1 to 6.5.2 in May. Restarting the indexer temporarily relieves the queuing but it immediately rolls over to another indexer, we start dropping data, and then have to restart that indexer. I don't have the time or the patience to babysit queuing all day long and bounce our indexers. I'd love to know what LDAP errors/warnings people were seeing so I can determine if we're having the same issue. Thanks!

ptoro
Explorer

UPDATE:

So finally got a Splunk engineer and went through the whole upgrade process. Long story short, LDAP was causing my indexers to bomb out. LDAP!!!!!!!

From what Splunk said, the indexer should not be doing lookups on queries coming from the search heads. I also see more (almost constant) LDAP warnings in 6.5.x over 6.4.x. So LDAP lookups was causing timeouts on the indexers and causing blocking. I disabled LDAP on the indexers and no more blocking.

Food for thought guys.

This at least doesn't affect our end users since the search heads still do LDAP and also proves that once the search is passed to the indexer, the user does not need to be in the indexer (auth wise). So Splunk will have to figure out why the indexers are trying to do auth on search head queries but at least it's a quick fix.

woodcock
Esteemed Legend

According to the amazing @davidpaper, the full details are:

Beginning with Splunk 6.5, the indexing threads wrongly attempt to authenticate during indexing, adding significant load on LDAP based external authentication services. This authentication incorrectly attempts to authenticate non-existing external auth users, in this case the splunk system user.

And can be found in Section 25 here:
https://docs.splunk.com/Documentation/Splunk/6.6.2/Security/ConfigureLDAPwithSplunkWeb

The key is to look for “Operations error” in splunkd.log and that indicates the LDAP services are failing to keep up with the rate of queries.

0 Karma

tellsworth_splu
Explorer

@ptoro

Are you seeing the LDAP lookups timing out in the splunk logs or your LDAP logs ?

0 Karma

ptoro
Explorer

in the splunk logs

0 Karma

rabitoblanco
Path Finder

@ptoro any recollection what LDAP related messages you were seeing on your indexers?

I have a bunch of "user=xyz not found" but I doubt that's a big deal...

0 Karma

o_calmels
Communicator

Hi,

We got exactly the same problem on our 2 indexers.

Can you explain me how to check if the problem is LDAP (what can I search ?)
And if this is same problem, how to disable LDAP on the indexers (I only use LDAP authentication for splunk users on theSearch Head.)

Cheers.
Olivier.

0 Karma

bport15
Path Finder

So index=_internal sourcetype=splunkd? Is there a specific error message to look for in the splunkd logs?

0 Karma

ptoro
Explorer

Had to revert back to 6.4 on our indexers. so far so good...

0 Karma

ptoro
Explorer

I am having the same exact issue after upgrading to 6.5 (from 6.4)

We have 4 indexers

"hot/warm is on an SSD Mirror, cold is on spinning disk but currently barely used at all (thresholds not met yet)" -> same setup as us

pre upgrade we never had this issue AND we had those warnings about the LineBreakingProcessor.

0 Karma

johnpof
Path Finder

So far I haven't seen any congestion since I made the change, I'm about to put full load on it now we'll see what happens.

0 Karma

johnpof
Path Finder

Errors are gone, thanks!

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...