Archive
Highlighted

SPLUNKD Agent crashing frequently

New Member

Agents are installed on the 3 servers capturing WebSphere Application Server logs. Nothing has been changed in the environment except WAS JVM was patched and then restarted the JVM. After that every other day the agent gets crashed and I am not able to understand from the logs.. I dont think JVM Patching is relevant but still mentioned it. Posting a snippet from a recently crashed agent:

version Splunk 4.2.1 (build 98164)

was 7.0.0.31

OS Linux 2.6.18-348.3.1.el5 #1 SMP Tue Mar 5 13:19:32 EST 2013 x8664 x8664 x86_64 GNU/Linux

LOG

Received fatal signal 6 (Aborted).
Cause:
Signal sent by PID 29658 running under UID 2441.
Crashing thread: MainTailingThread
Registers:
RIP: [0x0000003809030295] gsignal + 53 (/lib64/libc.so.6)
RDI: [0x00000000000073DA]
RSI: [0x00000000000073F4]
RBP: [0x00002B4911006940]
RSP: [0x00002B4911005628]
RAX: [0x0000000000000000]
RBX: [0x00002B49110056D0]
RCX: [0xFFFFFFFFFFFFFFFF]
RDX: [0x0000000000000006]
R8: [0x0000000000000080]
R9: [0x0101010101010101]
R10: [0x0000000000000008]
R11: [0x0000000000000202]
R12: [0x00007FFF6C43AA7D]
R13: [0x0000000000F984E8]
R14: [0x000000000000021A]
R15: [0x0000000000F97D18]
EFL: [0x0000000000000202]
TRAPNO: [0x0000000000000000]
ERR: [0x0000000000000000]
CSGSFS: [0x0000000000000033]
OLDMASK: [0x0000000000000000]

OS: Linux
Arch: x86-64

Backtrace:
[0x0000003809031D40] abort + 272 (/lib64/libc.so.6)
[0x0000003809029716] assert_fail + 246 (/lib64/libc.so.6)
[0x0000000000671673] _ZSt16
introsortloopIN9gnucxx17_normaliteratorIPP15WatchedTailFileSt6vectorIS3SaIS3EEEEl22WTFTimes
tampComparatorEvTSAT0T1 + 515 (splunkd)
[0x0000000000665C06] ZN10TailReader11trimOpenFdsEv + 230 (splunkd)
[0x0000000000665FDA] _ZN10TailReader8readDataER15WatchedTailFileP11TailWatcher + 650 (splunkd)
[0x0000000000666448] _ZN10TailReader8readFileER15WatchedTailFileP11TailWatcher + 184 (splunkd)
[0x0000000000666592] _ZN11TailWatcher8readFileER15WatchedTailFile + 82 (splunkd)
[0x0000000000668419] _ZN11TailWatcher11fileChangedEP16WatchedFileStateRK7Timeval + 1161 (splunkd)
[0x0000000000B6DAFC] _ZN30FilesystemChangeInternalWorker12when
expiredERy + 428 (splunkd)
[0x0000000000BB8E03] ZN11TimeoutHeap18runExpiredTimeoutsER7Timeval + 227 (splunkd)
[0x0000000000B673D8] _ZN9EventLoop3runEv + 216 (splunkd)
[0x000000000066B7AF] _ZN11TailWatcher3runEv + 143 (splunkd)
[0x000000000066D7C1] _ZN13TailingThread4mainEv + 289 (splunkd)
[0x0000000000BB6892] _ZN6Thread8callMainEPv + 66 (splunkd)
[0x0000003809C0683D] ? (/lib64/libpthread.so.0)
[0x00000038090D500D] clone + 109 (/lib64/libc.so.6)
Linux / alt-met-asol-was03 / 2.6.18-348.3.1.el5 / #1 SMP Tue Mar 5 13:19:32 EST 2013 / x86
64
Last few lines of stderr (may contain info on assertion failure, but also could be old):
2013-02-28 13:30:03.674 +0100 splunkd started (build 98164)
2013-04-15 13:46:42.551 +0200 Interrupt signal received
2013-04-15 13:50:48.847 +0200 splunkd started (build 98164)
2013-08-29 04:35:38.328 +0200 splunkd started (build 98164)
2014-02-24 11:45:17.483 +0100 splunkd started (build 98164)
2014-02-26 15:10:11.065 +0100 splunkd started (build 98164)
2014-03-03 11:45:29.363 +0100 Interrupt signal received
2014-03-03 11:45:38.980 +0100 splunkd started (build 98164)
2014-03-05 10:07:33.392 +0100 splunkd started (build 98164)
splunkd: /opt/splunk/p4/splunk/branches/hammer/src/pipeline/input/Tailing.cpp:538: bool WTFTimestampComparator::operator()(con
st WatchedTailFile, const WatchedTailFile😞 Assertion `!x->getLastEOFTime().isZero() && !y->getLastEOFTime().isZero()' failed.

/etc/redhat-release: Red Hat Enterprise Linux Server release 5.9 (Tikanga)
glibc version: 2.5
glibc release: stable
Threads running: 25
argv: [splunkd -p 8090 restart]
terminating...

0 Karma
Highlighted

Re: SPLUNKD Agent crashing frequently

Splunk Employee
Splunk Employee

This looks like an old bug in the 4.2.x. Upgrade to at least the last 4.3.x revision or newer and check that there are sufficient file descriptors for splunkd under the running credentials.

0 Karma