Dear,
I have this crash with the Splunk forwarder 6.2.4 for Linux x64. The forwarder crashes with frequency in a production environment. I don't have a problem with hardware resources.
crash-2015-07-24-07:08:46.log | more
[build 271043] 2015-07-24 07:08:46
Received fatal signal 6 (Aborted).
Cause:
Signal sent by PID 25804 running under UID 533.
Crashing thread: MainTailingThread
Registers:
RIP: [0x0000003FEC6329A5] gsignal + 53 (/lib64/libc.so.6)
RDI: [0x00000000000064CC]
RSI: [0x00000000000064F0]
RBP: [0x0000003FEC740C2D]
RSP: [0x00007F7966FFE288]
RAX: [0x0000000000000000]
RBX: [0x00007FFF6E2DD739]
RCX: [0xFFFFFFFFFFFFFFFF]
RDX: [0x0000000000000006]
R8: [0x00007F7966FFF710]
R9: [0x202764696C61765F]
R10: [0x0000000000000008]
R11: [0x0000000000000206]
R12: [0x00000000015B46B5]
R13: [0x0000000001614660]
R14: [0x0000003FEC740C2D]
R15: [0x0000000000000078]
EFL: [0x0000000000000206]
TRAPNO: [0x0000000000000000]
ERR: [0x0000000000000000]
CSGSFS: [0x0000000000000033]
OLDMASK: [0x0000000000000000]
OS: Linux
Arch: x86-64
Backtrace:
[0x0000003FEC6329A5] gsignal + 53 (/lib64/libc.so.6)
[0x0000003FEC634185] abort + 373 (/lib64/libc.so.6)
[0x0000003FEC62B935] __assert_fail + 245 (/lib64/libc.so.6)
[0x000000000099145A] ? (splunkd)
[0x000000000098D582] _ZNK11TailWatcher12setupConfigsER15WatchedTailFile + 1474 (splunkd)
[0x000000000098D692] _ZNK11TailWatcher19initializeFileStateER15WatchedTailFileRK8Pathname + 66 (splunkd)
[0x00000000009904B2] _ZN11TailWatcher11fileChangedEP16WatchedFileStateRK7Timeval + 242 (splunkd)
[0x0000000000EC2602] _ZN30FilesystemChangeInternalWorker15callFileChangedER7TimevalP16WatchedFileState + 114 (splunkd)
[0x0000000000EC3F90] _ZN30FilesystemChangeInternalWorker12when_expiredERy + 464 (splunkd)
[0x0000000000F53B2D] _ZN11TimeoutHeap18runExpiredTimeoutsER7Timeval + 301 (splunkd)
[0x0000000000EBD818] _ZN9EventLoop3runEv + 744 (splunkd)
[0x000000000098E9ED] _ZN11TailWatcher3runEv + 141 (splunkd)
[0x000000000099428A] _ZN13TailingThread4mainEv + 154 (splunkd)
[0x0000000000F5165E] _ZN6Thread8callMainEPv + 62 (splunkd)
[0x0000003FECA077E1] ? (/lib64/libpthread.so.0)
[0x0000003FEC6E153D] clone + 109 (/lib64/libc.so.6)
Linux / Dymas / 2.6.32-71.el6.x86_64 / #1 SMP Wed Sep 1 01:33:01 EDT 2010 / x86_64
Last few lines of stderr (may contain info on assertion failure, but also could be old):
2015-07-17 12:30:37.739 -0300 splunkd started (build 271043)
2015-07-17 12:40:16.775 -0300 Interrupt signal received
2015-07-17 14:31:12.322 -0300 splunkd started (build 271043)
splunkd: /home/build/build-src/6.2.4/src/pipeline/input/Tailing.h:120: bool StatWrap::isDir() const: Assertion `_valid' failed.
2015-07-20 22:08:15.270 -0300 splunkd started (build 271043)
splunkd: /home/build/build-src/6.2.4/src/pipeline/input/Tailing.h:120: bool StatWrap::isDir() const: Assertion `_valid' failed.
2015-07-22 13:13:13.179 -0300 splunkd started (build 271043)
splunkd: /home/build/build-src/6.2.4/src/pipeline/input/Tailing.h:120: bool StatWrap::isDir() const: Assertion `_valid' failed.
/etc/redhat-release: Red Hat Enterprise Linux Server release 6.0 (Santiago)
glibc version: 2.12
glibc release: stable
Last errno: 2
Threads running: 28
argv: [splunkd -p 8089 start]
Thread: "MainTailingThread", did_join=0, ready_to_run=Y, main_thread=N
First 8 bytes of Thread token @0x7f796701e150:
00000000 10 f7 ff 66 79 7f 00 00 |...fy...|
00000008
First 512 bytes of Timeout object @0x7f7966ffea88:
00000000 10 f2 6d 01 00 00 00 00 00 00 00 00 00 00 00 00 |..m.............|
00000010 38 e8 ff 66 79 7f 00 00 00 00 00 00 00 00 00 00 |8..fy...........|
00000020 00 00 00 00 00 00 00 00 ad 0e b2 55 00 00 00 00 |...........U....|
[ope_splunk@Dymas splunk]$ cat crash-2015-07-24-07:08:46.log
[build 271043] 2015-07-24 07:08:46
Received fatal signal 6 (Aborted).
Cause:
Signal sent by PID 25804 running under UID 533.
Crashing thread: MainTailingThread
Registers:
RIP: [0x0000003FEC6329A5] gsignal + 53 (/lib64/libc.so.6)
RDI: [0x00000000000064CC]
RSI: [0x00000000000064F0]
RBP: [0x0000003FEC740C2D]
RSP: [0x00007F7966FFE288]
RAX: [0x0000000000000000]
RBX: [0x00007FFF6E2DD739]
RCX: [0xFFFFFFFFFFFFFFFF]
RDX: [0x0000000000000006]
R8: [0x00007F7966FFF710]
R9: [0x202764696C61765F]
R10: [0x0000000000000008]
R11: [0x0000000000000206]
R12: [0x00000000015B46B5]
R13: [0x0000000001614660]
R14: [0x0000003FEC740C2D]
R15: [0x0000000000000078]
EFL: [0x0000000000000206]
TRAPNO: [0x0000000000000000]
ERR: [0x0000000000000000]
CSGSFS: [0x0000000000000033]
OLDMASK: [0x0000000000000000]
OS: Linux
Arch: x86-64
Backtrace:
[0x0000003FEC6329A5] gsignal + 53 (/lib64/libc.so.6)
[0x0000003FEC634185] abort + 373 (/lib64/libc.so.6)
[0x0000003FEC62B935] __assert_fail + 245 (/lib64/libc.so.6)
[0x000000000099145A] ? (splunkd)
[0x000000000098D582] _ZNK11TailWatcher12setupConfigsER15WatchedTailFile + 1474 (splunkd)
[0x000000000098D692] _ZNK11TailWatcher19initializeFileStateER15WatchedTailFileRK8Pathname + 66 (splunkd)
[0x00000000009904B2] _ZN11TailWatcher11fileChangedEP16WatchedFileStateRK7Timeval + 242 (splunkd)
[0x0000000000EC2602] _ZN30FilesystemChangeInternalWorker15callFileChangedER7TimevalP16WatchedFileState + 114 (splunkd)
[0x0000000000EC3F90] _ZN30FilesystemChangeInternalWorker12when_expiredERy + 464 (splunkd)
[0x0000000000F53B2D] _ZN11TimeoutHeap18runExpiredTimeoutsER7Timeval + 301 (splunkd)
[0x0000000000EBD818] _ZN9EventLoop3runEv + 744 (splunkd)
[0x000000000098E9ED] _ZN11TailWatcher3runEv + 141 (splunkd)
[0x000000000099428A] _ZN13TailingThread4mainEv + 154 (splunkd)
[0x0000000000F5165E] _ZN6Thread8callMainEPv + 62 (splunkd)
[0x0000003FECA077E1] ? (/lib64/libpthread.so.0)
[0x0000003FEC6E153D] clone + 109 (/lib64/libc.so.6)
Linux / Dymas / 2.6.32-71.el6.x86_64 / #1 SMP Wed Sep 1 01:33:01 EDT 2010 / x86_64
Last few lines of stderr (may contain info on assertion failure, but also could be old):
2015-07-17 12:30:37.739 -0300 splunkd started (build 271043)
2015-07-17 12:40:16.775 -0300 Interrupt signal received
2015-07-17 14:31:12.322 -0300 splunkd started (build 271043)
splunkd: /home/build/build-src/6.2.4/src/pipeline/input/Tailing.h:120: bool StatWrap::isDir() const: Assertion `_valid' failed.
2015-07-20 22:08:15.270 -0300 splunkd started (build 271043)
splunkd: /home/build/build-src/6.2.4/src/pipeline/input/Tailing.h:120: bool StatWrap::isDir() const: Assertion `_valid' failed.
2015-07-22 13:13:13.179 -0300 splunkd started (build 271043)
splunkd: /home/build/build-src/6.2.4/src/pipeline/input/Tailing.h:120: bool StatWrap::isDir() const: Assertion `_valid' failed.
/etc/redhat-release: Red Hat Enterprise Linux Server release 6.0 (Santiago)
glibc version: 2.12
glibc release: stable
Last errno: 2
Threads running: 28
argv: [splunkd -p 8089 start]
Thread: "MainTailingThread", did_join=0, ready_to_run=Y, main_thread=N
First 8 bytes of Thread token @0x7f796701e150:
00000000 10 f7 ff 66 79 7f 00 00 |...fy...|
00000008
First 512 bytes of Timeout object @0x7f7966ffea88:
00000000 10 f2 6d 01 00 00 00 00 00 00 00 00 00 00 00 00 |..m.............|
00000010 38 e8 ff 66 79 7f 00 00 00 00 00 00 00 00 00 00 |8..fy...........|
00000020 00 00 00 00 00 00 00 00 ad 0e b2 55 00 00 00 00 |...........U....|
00000030 55 7f 0d 00 00 00 00 00 00 00 00 00 00 00 00 00 |U...............|
00000040 80 ea ff 66 79 7f 00 00 20 ec ff 66 79 7f 00 00 |...fy... ..fy...|
00000050 01 00 00 00 01 00 00 00 c0 1c 46 64 79 7f 00 00 |..........Fdy...|
00000060 c0 a7 02 66 79 7f 00 00 80 2a 46 64 79 7f 00 00 |...fy....*Fdy...|
00000070 80 2a 46 64 79 7f 00 00 00 eb ff 66 79 7f 00 00 |.*Fdy......fy...|
00000080 00 eb ff 66 79 7f 00 00 10 eb ff 66 79 7f 00 00 |...fy......fy...|
00000090 10 eb ff 66 79 7f 00 00 20 eb ff 66 79 7f 00 00 |...fy... ..fy...|
000000a0 20 eb ff 66 79 7f 00 00 40 29 46 64 79 7f 00 00 | ..fy...@)Fdy...|
000000b0 40 29 46 64 79 7f 00 00 00 00 00 00 00 00 00 00 |@)Fdy...........|
000000c0 00 b0 05 66 79 7f 00 00 14 10 00 00 00 00 00 00 |...fy...........|
000000d0 51 fd df 2c 00 00 00 00 78 49 61 01 00 00 00 00 |Q..,....xIa.....|
000000e0 58 10 02 66 79 7f 00 00 00 00 00 00 00 00 00 00 |X..fy...........|
000000f0 00 00 00 00 00 00 00 00 80 45 2b 66 79 7f 00 00 |.........E+fy...|
00000100 f0 44 2b 66 79 7f 00 00 b0 45 2b 66 79 7f 00 00 |.D+fy....E+fy...|
00000110 09 00 00 00 00 00 00 00 00 22 02 66 79 7f 00 00 |.........".fy...|
00000120 64 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |d...............|
00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000140 b8 eb ff 66 79 7f 00 00 b8 eb ff 66 79 7f 00 00 |...fy......fy...|
00000150 00 00 00 00 00 00 00 00 80 f0 02 66 79 7f 00 00 |...........fy...|
00000160 60 01 82 6d 79 7f 00 00 00 00 00 00 00 00 00 00 |`..my...........|
00000170 d8 2b 38 73 79 7f 00 00 00 22 02 66 79 7f 00 00 |.+8sy....".fy...|
00000180 40 26 13 66 79 7f 00 00 70 26 13 66 79 7f 00 00 |@&.fy...p&.fy...|
00000190 80 26 13 66 79 7f 00 00 2a 00 00 00 10 00 00 00 |.&.fy...*.......|
000001a0 80 1c 46 64 79 7f 00 00 00 00 00 00 00 00 00 00 |..Fdy...........|
000001b0 88 ea ff 66 79 7f 00 00 00 00 00 00 00 00 00 00 |...fy...........|
000001c0 22 00 00 00 79 7f 00 00 c0 40 03 66 79 7f 00 00 |"...y....@.fy...|
000001d0 10 00 00 00 aa aa aa aa 00 00 00 00 00 00 00 00 |................|
000001e0 00 00 00 00 79 7f 00 00 60 e9 ff 66 79 7f 00 00 |....y...`..fy...|
000001f0 10 f7 ff 66 79 7f 00 00 00 00 00 00 00 00 00 00 |...fy...........|
00000200
FilesystemChangeWatcher: _timeoutActive=Y, _throttled=N, _waitingForNotifyCount=1
EMPTY Q: waitingForTimeout=N, noAction=N, stat=Y, immediateStat=Y, readdir=Y, notify=N
WatchedTailFile-WatchedFileState: path="/opt/splunkforwarder/var/run/nmon/var/csv_repository/Dymas_24_JUL_2015_053319_FILE_444882_20150724070843.nmon.csv", flags=0x24003
First 144 bytes of PathnameStat @0x7f7964462ac8:
00000000 31 35 3a 32 33 3a 35 38 2e 34 39 37 20 2d 30 33 |15:23:58.497 -03|
00000010 30 30 20 49 4e 46 4f 20 20 53 65 72 76 65 72 43 |00 INFO ServerC|
00000020 6f 6e 66 69 67 20 2d 20 4d 79 20 6e 65 77 6c 79 |onfig - My newly|
00000030 20 67 65 6e 65 72 61 74 65 64 20 47 55 49 44 20 | generated GUID |
00000040 69 73 20 37 30 45 35 45 37 38 42 2d 42 38 46 33 |is 70E5E78B-B8F3|
00000050 2d 34 36 35 38 2d 38 30 43 30 2d 41 41 32 36 34 |-4658-80C0-AA264|
00000060 35 43 37 35 46 43 42 0a 30 37 2d 31 34 2d 32 30 |5C75FCB.07-14-20|
00000070 31 35 20 31 35 3a 32 33 3a 35 38 2e 34 39 38 20 |15 15:23:58.498 |
00000080 2d 30 33 30 30 20 49 4e 46 4f 20 20 53 65 72 76 |-0300 INFO Serv|
00000090
FilesystemChangeWatcher: _timeoutActive=Y, _throttled=N, _waitingForNotifyCount=1
EMPTY Q: waitingForTimeout=N, noAction=N, stat=Y, immediateStat=Y, readdir=Y, notify=N
Timeout: _when = 2321387029209298234.5641075399597436973, _initialMsec = 8534995651516190534
file-in: _initialized=Y, _lastCharWasNewline=N, _lastReadHadNulls=N, _wasCrcConflict=N, _warned=N
_nullsWarned=N, _wasTooNew=N, _exists=N, _noDebug=N
_hadExplicitSource=N, _crossedInitCrcLenBoundary=N, _classifiedAtLeastOnce=N, _fileReplaced=N, _readPathAfterRealEOF=N
_onlyNotifiedOnce=Y, _isArchive=N, _isCached=111213, _unowned=N, _deleteOnEOF=N
_overrideDeleteOnEOF=N, _doNotDeleteChildren=N, _readFromEnd=N, _readIrregardless=N
_fileCheckMethod=0, _crcSalt=<null>, _origPath=<null>
_bytesRead=0, _storingBytesRead=0, _initCrc=0x0, _seekCrc=0x0
_filenameCrc=0x91b72885e4a713ee, _fallbackCrc=0x0, _lastEOFTime=<zero>, _modTime=<zero>
_eofSeconds=3, _ignoreThresh=<zero>, _initCrcBytes=256, _initCrcForBatch=0x0
_pendingMetadata=<null>
_prevFd=-1, _pdModels=[0 PDs]
_rescheduleDelay=1000, _rescheduleTarget=<zero>, _name=/opt/splunkforwarder/var/run/nmon/var/csv_repository/Dymas_24_JUL_2015_053319_FILE_444882_20150724070843.nmon.csv, _statusName=
_st=[dev=64768, ino=1850031, mode=100600, size=1130, mtime=1437732465, owner=533, group=533]
_toStringPrefix=state=0x0x7f7964462a40, _backoff=0
_stdataInputHeaderProcessing=[]
_detectTrailingNulls=N, _detectReadingFromOffSet=N, _readAndSkipHeader=N, _uniqueId=72747
_rawPath=
x86 CPUID registers:
0: 0000000B 756E6547 6C65746E 49656E69
1: 000206C2 10200800 029EE3FF BFEBFBFF
2: 55035A01 00F0B2FF 00000000 00CA0000
3: 00000000 00000000 00000000 00000000
4: 00000000 00000000 00000000 00000000
5: 00000040 00000040 00000003 00001120
6: 00000007 00000002 00000009 00000000
7: 00000000 00000000 00000000 00000000
8: 00000000 00000000 00000000 00000000
9: 00000000 00000000 00000000 00000000
A: 07300403 00000004 00000000 00000603
B: 00000000 00000000 000000AD 00000010
80000000: 80000008 00000000 00000000 00000000
80000001: 00000000 00000000 00000001 2C100800
80000002: 65746E49 2952286C 6F655820 2952286E
80000003: 55504320 20202020 20202020 58202020
80000004: 30353635 20402020 37362E32 007A4847
80000005: 00000000 00000000 00000000 00000000
80000006: 00000000 00000000 01006040 00000000
80000007: 00000000 00000000 00000000 00000100
80000008: 00003028 00000000 00000000 00000000
terminating...
The crash pattern is similar to a known bug in splunk 6.2.4, the fix will be in the next version.
6.2.4.1 patch is already available to address this issue. Ask splunk support to provide bits.
6.2.5 is globally available for download, with the patch in change log:
2015-8-11 SPL-104017 Splunkd crash due to assertion failure in Tailing.
Hi armonsal,
Splunk 6.2.4 seems to have introduced a bug that causes splunkd to crash when a monitor watches for files that may be deleted (maybe too fast ?)
I see in your output that the crash is related with the Nmon Performance Monitor:
WatchedTailFile-WatchedFileState: path="/opt/splunkforwarder/var/run/nmon/var/csv_repository/Dymas_24_JUL_2015_053319_FILE_444882_20150724070843.nmon.csv", flags=0x24003
The crash is not directly caused by Nmon App, until recently the processing steps used to create csv files in the same directory than splunk watches for, in some cases empty files could be created and deleted by nmon2csv converters, which causes splunkd in 6.2.4 to crash. (which is totally unexpected and wasn't the case before)
I have released on 5 august 2014 an hotfix release with a workaround to manage this, now files are moved from a working directory to final directory splunk watches for, which solves the issue from splunkd.
Please update to Nmon Perf Monitor 1.6.04 and your problem will be solved.
Guilhem
The crash pattern is similar to a known bug in splunk 6.2.4, the fix will be in the next version.
Hi,
big thanks for that information...
sincerely oliver