Linux - splunkd v4.1.4 crash with LDAP authenticat...

scarteratwork · ‎09-01-2010

Enabling LDAP - splunkd crash on startup.

Running Splunk standalone (i.e. not clustered as per previous post)
Splunk v4.1.4 (build 82143).
LDAP against Windows Server 2003 Active Directory. Server hit has a global catalog.
ldapsearch tests for both groups & users are successful as per splunk docs.
Have set groupBaseFilter to only include (cn=APP-Splunk*) groups (3 exist)
Have set userBaseFilter to only include my account (cn=myname)
splunkd_stderr.log says: src/tcmalloc.cc:353] Attempt to free invalid pointer: 0x1b00010
Last line in splunkd.log says: INFO loader - Instantiated plugin: thruputprocessor
Running on physical box with 8 cores & 16GB RAM. SLES 11 amd64.
Reverting back to Splunk (internal) authenticaiton allows Splunk to start clean.
Crash log output below.

Any ideas?

[build 82143]
Received fatal signal 6 (Aborted).
 Cause:
   Signal sent by PID 29447 running under UID 0.
 Crashing thread: Main Thread
 Registers:
    RIP:  [0x00007F38F5A9C645] gsignal + 53 (/lib64/libc.so.6)
    RDI:  [0x0000000000007307]
    RSI:  [0x0000000000007310]
    RBP:  [0x00007F38F5465F80]
    RSP:  [0x00007F38F5465AF8]
    RAX:  [0x0000000000000000]
    RBX:  [0x00007F38F5465C30]
    RCX:  [0xFFFFFFFFFFFFFFFF]
    RDX:  [0x0000000000000006]
    R8:  [0x00007F38F5B837C0]
    R9:  [0x2064696C61766E69]
    R10:  [0x0000000000000008]
    R11:  [0x0000000000000202]
    R12:  [0x0000000000F4BBA0]
    R13:  [0x0000000000000000]
    R14:  [0x0000000000000000]
    R15:  [0x0000000000001000]
    EFL:  [0x0000000000000202]
    TRAPNO:  [0x0000000000000000]
    ERR:  [0x0000000000000000]
    CSGSFS:  [0x0000000000000033]
    OLDMASK:  [0x0000000000000000]

 OS: Linux
 Arch: x86-64

 Backtrace:
  [0x00007F38F5A9DC33] abort + 387 (/lib64/libc.so.6)
  [0x0000000000AC36EF] ? (splunkd)
  [0x0000000000AC38A6] _ZN22TCMalloc_CrashReporter12PrintfAndDieEPKcz + 150 (splunkd)
  [0x0000000000ABC08B] _ZN123_GLOBAL__N__ZN61FLAG__namespace_do_not_use_directly_use_DECLARE_int64_instead43FLAGS_tcmalloc_large_alloc_report_thresholdE11InvalidFreeEPv + 43 (splunkd)
  [0x0000000000DD7D35] tc_free + 453 (splunkd)
  [0x00007F38F5B4A10D] __res_iclose + 189 (/lib64/libc.so.6)
  [0x00007F38F5B75234] ? (/lib64/libc.so.6)
  [0x00007F38F5B751C2] __libc_thread_freeres + 34 (/lib64/libc.so.6)
  [0x00007F38F7052083] ? (/lib64/libpthread.so.0)
  [0x00007F38F5B3D10D] clone + 109 (/lib64/libc.so.6)
 Linux / myserver / 2.6.27.45-0.1-default / #1 SMP 2010-02-22 16:49:47 +0100 / x86_64
 Last few lines of stderr (may contain info on assertion failure, but also could be old):
    src/tcmalloc.cc:353] Attempt to free invalid pointer: 0x1b00010

 /etc/SuSE-release: SUSE Linux Enterprise Server 11 (x86_64)
 glibc version: 2.9
 glibc release: stable
Threads running: 14
terminating...

mitch · ‎03-09-2011

Hi. I finally have a good answer for your question.

Over the last several months we saw a slow trickle of reports of this crash, but we never had enough information to isolate it. What made it more frustrating is that it seemed to happen to just a few customers, and even for them it seemed to be hard to reproduce.. sometimes they would have splunk crash several times in a row then the problem would suddenly disappear for no apparent reason.

Finally we had enough reports to piece together the common thread: all of the reports are running 64-bit SuSE 11 of some sort. After a LOT of investigation we found out that it's due to a known bug in SuSE which Novell is planning to fix for OpenSuSE 11.4. They'll presumably also fix it in a future SLES version as well.

The good news is that we have identified a workaround to splunk that lets us avoid this bug and will include it in all future versions of splunk (i.e. newer than "4.1.7" which is current as of this writing)

If this crash is happening often enough to cause you serious problems (and you can't wait for the next splunk release) you may want to get an early-access testing build from splunk support. Please reference bug "SPL-37331" so they know what issue you're referring to. Again, this is ONLY for 64-bit SuSE installs: no other OSes are affected by this issue.

mitch · ‎03-10-2011

Jason -- at least of the reports that we've seen several seem to have popped up when enabling LDAP auth. Other crash reports didn't have LDAP at all. We've also successfully run LDAP on SuSE 11 with splunk 4.1.6 in-house, without problems.

So you're right -- the bug in SuSE's libc isn't related to LDAP. However, it does seem that using LDAP changes the timing of things to help provoke the crash for some environments.

Jason · ‎03-09-2011

It does not have anything to do with AD authentication, as boxes I'm working with use Splunk standard auth.

Jason · ‎03-09-2011

This bug evidently can also manifest itself as a crash on restart, so you may not notice it at first, but crash logs will accumulate in $SPLUNK_HOME/var/log/splunk/

jrodman · ‎03-09-2011

The telltales are __res_iclose and __libc_thread_freeres in the backtrace.

dwaddle · ‎09-01-2010

I would highly recommend that you persue a support case for ANY splunkd crashes. You might get a suitable answer here from someone - but more likely your crashinfo is going to need to be evaluated by someone who has access to the source code to get more of a context around the backtrace above.

scarteratwork · ‎09-02-2010

Thanks. Will follow up with Splunk

Linux - splunkd v4.1.4 crash with LDAP authentication enabled

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Automating Threat Operations and Threat Hunting with Recorded Future

Join the Conversation