Hey all,
We have seen these crashes happen on two servers the past few days. Is there anything in the crash log that would help identify a root cause? I have browsed through the log but nothing jumps out at me.
See logs below...
Thx,
JB
[splunk@rtlvpxawsb splunk]$ cat crash-2016-01-31-00\:10\:01.log
[build 271043] 2016-01-31 00:10:01
Received fatal signal 6 (Aborted).
Cause:
Signal sent by PID 10897 running under UID 49436.
Crashing thread: MainTailingThread
Registers:
RIP: [0x0000003E7CA32625] gsignal + 53 (/lib64/libc.so.6)
RDI: [0x0000000000002A91]
RSI: [0x0000000000002AB0]
RBP: [0x0000000001612A40]
RSP: [0x00007FD2AA5F9278]
RAX: [0x0000000000000000]
RBX: [0x00007FD2B3ECB000]
RCX: [0xFFFFFFFFFFFFFFFF]
RDX: [0x0000000000000006]
R8: [0xFEFEFEFEFEFEFEFF]
R9: [0x00007FD2B3F61F60]
R10: [0x0000000000000008]
R11: [0x0000000000000202]
R12: [0x00000000015B46B5]
R13: [0x0000000001614660]
R14: [0x00007FD2AA5F9B30]
R15: [0x00007FD2A98A42C0]
EFL: [0x0000000000000202]
TRAPNO: [0x0000000000000000]
ERR: [0x0000000000000000]
CSGSFS: [0x0000000000000033]
OLDMASK: [0x0000000000000000]
OS: Linux
Arch: x86-64
Backtrace:
[0x0000003E7CA32625] gsignal + 53 (/lib64/libc.so.6)
[0x0000003E7CA33E05] abort + 373 (/lib64/libc.so.6)
[0x0000003E7CA2B74E] ? (/lib64/libc.so.6)
[0x0000003E7CA2B810] __assert_perror_fail + 0 (/lib64/libc.so.6)
[0x000000000099145A] ? (splunkd)
[0x000000000098D582] _ZNK11TailWatcher12setupConfigsER15WatchedTailFile + 1474 (splunkd)
[0x000000000098D692] _ZNK11TailWatcher19initializeFileStateER15WatchedTailFileRK8Pathname + 66 (splunkd)
[0x00000000009904B2] _ZN11TailWatcher11fileChangedEP16WatchedFileStateRK7Timeval + 242 (splunkd)
[0x0000000000EC2602] _ZN30FilesystemChangeInternalWorker15callFileChangedER7TimevalP16WatchedFileState + 114 (splunkd)
[0x0000000000EC3F90] _ZN30FilesystemChangeInternalWorker12when_expiredERy + 464 (splunkd)
[0x0000000000F53B2D] _ZN11TimeoutHeap18runExpiredTimeoutsER7Timeval + 301 (splunkd)
[0x0000000000EBD818] _ZN9EventLoop3runEv + 744 (splunkd)
[0x000000000098E9ED] _ZN11TailWatcher3runEv + 141 (splunkd)
[0x000000000099428A] _ZN13TailingThread4mainEv + 154 (splunkd)
[0x0000000000F5165E] _ZN6Thread8callMainEPv + 62 (splunkd)
[0x0000003E7CE079D1] ? (/lib64/libpthread.so.0)
[0x0000003E7CAE88FD] clone + 109 (/lib64/libc.so.6)
Linux / rtlvpxawsb.labcorp.com / 2.6.32-504.16.2.el6.x86_64 / #1 SMP Tue Mar 10 17:01:00 EDT 2015 / x86_64
Last few lines of stderr (may contain info on assertion failure, but also could be old):
2015-10-14 16:39:28.077 -0400 splunkd started (build 271043)
Conf mutator lockfile has disappeared; error condition possible.
2015-10-15 15:07:01.431 -0400 splunkd started (build 271043)
Conf mutator lockfile has disappeared; error condition possible.
2015-10-15 16:25:07.121 -0400 splunkd started (build 271043)
Conf mutator lockfile has disappeared; error condition possible.
2015-10-29 14:25:14.432 -0400 splunkd started (build 271043)
splunkd: /home/build/build-src/6.2.4/src/pipeline/input/Tailing.h:120: bool StatWrap::isDir() const: Assertion `_valid' failed.
2015-12-15 08:16:03.012 -0500 splunkd started (build 271043)
splunkd: /home/build/build-src/6.2.4/src/pipeline/input/Tailing.h:120: bool StatWrap::isDir() const: Assertion `_valid' failed.
/etc/redhat-release: Red Hat Enterprise Linux Server release 6.6 (Santiago)
glibc version: 2.12
glibc release: stable
Last errno: 2
Threads running: 30
argv: [splunkd -p 8089 start]
Thread: "MainTailingThread", did_join=0, ready_to_run=Y, main_thread=N
First 8 bytes of Thread token @0x7fd2b1c6d150:
00000000 00 a7 5f aa d2 7f 00 00 |.._.....|
00000008
First 512 bytes of Timeout object @0x7fd2aa5f9a88:
00000000 10 f2 6d 01 00 00 00 00 00 00 00 00 00 00 00 00 |..m.............|
00000010 38 98 5f aa d2 7f 00 00 00 00 00 00 00 00 00 00 |8._.............|
00000020 00 00 00 00 00 00 00 00 29 97 ad 56 00 00 00 00 |........)..V....|
00000030 3e 94 0d 00 00 00 00 00 00 00 00 00 00 00 00 00 |>...............|
00000040 80 9a 5f aa d2 7f 00 00 20 9c 5f aa d2 7f 00 00 |.._..... ._.....|
00000050 01 00 00 00 01 00 00 00 c0 a7 14 a9 d2 7f 00 00 |................|
00000060 80 d6 1c b0 d2 7f 00 00 c0 28 10 a9 d2 7f 00 00 |.........(......|
00000070 00 43 8a a9 d2 7f 00 00 00 9b 5f aa d2 7f 00 00 |.C........_.....|
00000080 00 9b 5f aa d2 7f 00 00 10 9b 5f aa d2 7f 00 00 |.._......._.....|
00000090 10 9b 5f aa d2 7f 00 00 20 9b 5f aa d2 7f 00 00 |.._..... ._.....|
000000a0 20 9b 5f aa d2 7f 00 00 c0 41 8a a9 d2 7f 00 00 | ._......A......|
000000b0 40 3f 8a a9 d2 7f 00 00 00 00 00 00 00 00 00 00 |@?..............|
000000c0 00 e0 0d ab d2 7f 00 00 14 10 00 00 00 00 00 00 |................|
000000d0 51 f3 4b 55 00 00 00 00 78 49 61 01 00 00 00 00 |Q.KU....xIa.....|
000000e0 d8 f6 16 b0 d2 7f 00 00 00 00 00 00 00 00 00 00 |................|
000000f0 00 00 00 00 00 00 00 00 80 4b 1f ab d2 7f 00 00 |.........K......|
00000100 10 4c 1f ab d2 7f 00 00 c0 4d 1f ab d2 7f 00 00 |.L.......M......|
00000110 0c 00 00 00 00 00 00 00 00 e4 1e b0 d2 7f 00 00 |................|
00000120 64 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |d...............|
00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000140 b8 9b 5f aa d2 7f 00 00 b8 9b 5f aa d2 7f 00 00 |.._......._.....|
00000150 00 00 00 00 00 00 00 00 80 63 1a b0 d2 7f 00 00 |.........c......|
00000160 e0 8d 82 af d2 7f 00 00 00 00 00 00 00 00 00 00 |................|
00000170 18 2e 45 b2 d2 7f 00 00 00 e4 1e b0 d2 7f 00 00 |..E.............|
00000180 00 48 0c ab d2 7f 00 00 30 48 0c ab d2 7f 00 00 |.H......0H......|
00000190 40 48 0c ab d2 7f 00 00 26 00 00 00 10 00 00 00 |@H......&.......|
000001a0 80 a7 14 a9 d2 7f 00 00 00 00 00 00 00 00 00 00 |................|
000001b0 88 9a 5f aa d2 7f 00 00 00 00 00 00 00 00 00 00 |.._.............|
000001c0 25 00 00 00 d2 7f 00 00 40 6b 2e b0 d2 7f 00 00 |%.......@k......|
000001d0 10 00 00 00 aa aa aa aa 00 00 00 00 00 00 00 00 |................|
000001e0 00 00 00 00 d2 7f 00 00 60 99 5f aa d2 7f 00 00 |........`._.....|
000001f0 00 a7 5f aa d2 7f 00 00 00 00 00 00 00 00 00 00 |.._.............|
00000200
FilesystemChangeWatcher: _timeoutActive=Y, _throttled=N, _waitingForNotifyCount=1
EMPTY Q: waitingForTimeout=N, noAction=N, stat=Y, immediateStat=Y, readdir=Y, notify=N
WatchedTailFile-WatchedFileState: path="/etc/httpd/logs/stage-phoenix.labcorp.com-ssl-request.log.5", flags=0x24023
First 144 bytes of PathnameStat @0x7fd2a98a4348:
00000000 30 2c 20 73 6f 75 72 63 65 50 6f 72 74 3d 38 30 |0, sourcePort=80|
00000010 38 39 2c 20 64 65 73 74 49 70 3d 31 30 2e 31 31 |89, destIp=10.11|
00000020 31 2e 31 2e 31 39 35 2c 20 64 65 73 74 50 6f 72 |1.1.195, destPor|
00000030 74 3d 39 39 39 37 2c 20 5f 74 63 70 5f 42 70 73 |t=9997, _tcp_Bps|
00000040 3d 32 34 31 39 2e 34 37 2c 20 5f 74 63 70 5f 4b |=2419.47, _tcp_K|
00000050 42 70 73 3d 32 2e 33 36 2c 20 5f 74 63 70 5f 61 |Bps=2.36, _tcp_a|
00000060 76 67 5f 74 68 72 75 70 75 74 3d 32 2e 33 36 2c |vg_thruput=2.36,|
00000070 20 5f 74 63 70 5f 4b 70 72 6f 63 65 73 73 65 64 | _tcp_Kprocessed|
00000080 3d 37 31 2c 20 5f 74 63 70 5f 65 70 73 3d 31 2e |=71, _tcp_eps=1.|
00000090
FilesystemChangeWatcher: _timeoutActive=Y, _throttled=N, _waitingForNotifyCount=1
EMPTY Q: waitingForTimeout=N, noAction=N, stat=Y, immediateStat=Y, readdir=Y, notify=N
Timeout: _when = 2321382613982983482.5641075399597568045, _initialMsec = 8247328199548096326
file-in: _initialized=Y, _lastCharWasNewline=N, _lastReadHadNulls=N, _wasCrcConflict=N, _warned=N
_nullsWarned=N, _wasTooNew=N, _exists=N, _noDebug=N
_hadExplicitSource=N, _crossedInitCrcLenBoundary=N, _classifiedAtLeastOnce=N, _fileReplaced=N, _readPathAfterRealEOF=N
_onlyNotifiedOnce=Y, _isArchive=N, _isCached=111213, _unowned=N, _deleteOnEOF=N
_overrideDeleteOnEOF=N, _doNotDeleteChildren=N, _readFromEnd=N, _readIrregardless=N
_fileCheckMethod=0, _crcSalt=<null>, _origPath=<null>
_bytesRead=0, _storingBytesRead=0, _initCrc=0x0, _seekCrc=0x0
_filenameCrc=0x16ab246dab3357c1, _fallbackCrc=0x0, _lastEOFTime=<zero>, _modTime=<zero>
_eofSeconds=3, _ignoreThresh=<zero>, _initCrcBytes=256, _initCrcForBatch=0x0
_pendingMetadata=<null>
_prevFd=-1, _pdModels=[0 PDs]
_rescheduleDelay=1000, _rescheduleTarget=<zero>, _name=/etc/httpd/logs/stage-phoenix.labcorp.com-ssl-request.log.5, _statusName=
_st=[dev=64773, ino=36, mode=100644, size=7204328, mtime=1453796447, owner=0, group=3000]
_toStringPrefix=state=0x0x7fd2a98a42c0, _backoff=0
_stdataInputHeaderProcessing=[]
_detectTrailingNulls=N, _detectReadingFromOffSet=N, _readAndSkipHeader=N, _uniqueId=439908
_rawPath=
x86 CPUID registers:
0: 0000000D 756E6547 6C65746E 49656E69
1: 000206D7 02010800 9E982203 0FABFBFF
2: 76035A01 00F0B2FF 00000000 00CA0000
3: 00000000 00000000 00000000 00000000
4: 00000000 00000000 00000000 00000000
5: 00000000 00000000 00000000 00000000
6: 00000077 00000002 00000009 00000000
7: 00000000 00000000 00000000 00000000
8: 00000000 00000000 00000000 00000000
9: 00000001 00000000 00000000 00000000
A: 07300401 0000007F 00000000 00000000
B: 00000000 00000000 000000FD 00000002
C: 00000000 00000000 00000000 00000000
😧 00000000 00000000 00000000 00000000
80000000: 80000008 00000000 00000000 00000000
80000001: 00000000 00000000 00000001 28100800
80000002: 20202020 49202020 6C65746E 20295228
80000003: 6E6F6558 20295228 20555043 342D3545
80000004: 20303436 20402030 30342E32 007A4847
80000005: 00000000 00000000 00000000 00000000
80000006: 00000000 00000000 01006040 00000000
80000007: 00000000 00000000 00000000 00000100
80000008: 00003028 00000000 00000000 00000000
terminating...
You might be hitting bug SPL-104017. 6.2.5 + has a fix for this, I would suggest upgrading if possible.
Here is some other people talking about the same issue.
https://answers.splunk.com/answers/290645/why-is-our-splunk-624-forwarder-on-linux-crashing.html
I have a very similar problem.. the crash occurs when we move one day old apache log files out of the active folder to an archive folder. the log file identified in the crash file is one of the no-longer active apache logs.
You might be hitting bug SPL-104017. 6.2.5 + has a fix for this, I would suggest upgrading if possible.
Here is some other people talking about the same issue.
https://answers.splunk.com/answers/290645/why-is-our-splunk-624-forwarder-on-linux-crashing.html