Deployment Architecture

Splunk 6.2.4 Light Forwarder Crashing

butzowj
Path Finder

Hey all,

We have seen these crashes happen on two servers the past few days. Is there anything in the crash log that would help identify a root cause? I have browsed through the log but nothing jumps out at me.

See logs below...

Thx,
JB

[splunk@rtlvpxawsb splunk]$ cat crash-2016-01-31-00\:10\:01.log
[build 271043] 2016-01-31 00:10:01
Received fatal signal 6 (Aborted).
 Cause:
   Signal sent by PID 10897 running under UID 49436.
 Crashing thread: MainTailingThread
 Registers:
    RIP:  [0x0000003E7CA32625] gsignal + 53 (/lib64/libc.so.6)
    RDI:  [0x0000000000002A91]
    RSI:  [0x0000000000002AB0]
    RBP:  [0x0000000001612A40]
    RSP:  [0x00007FD2AA5F9278]
    RAX:  [0x0000000000000000]
    RBX:  [0x00007FD2B3ECB000]
    RCX:  [0xFFFFFFFFFFFFFFFF]
    RDX:  [0x0000000000000006]
    R8:  [0xFEFEFEFEFEFEFEFF]
    R9:  [0x00007FD2B3F61F60]
    R10:  [0x0000000000000008]
    R11:  [0x0000000000000202]
    R12:  [0x00000000015B46B5]
    R13:  [0x0000000001614660]
    R14:  [0x00007FD2AA5F9B30]
    R15:  [0x00007FD2A98A42C0]
    EFL:  [0x0000000000000202]
    TRAPNO:  [0x0000000000000000]
    ERR:  [0x0000000000000000]
    CSGSFS:  [0x0000000000000033]
    OLDMASK:  [0x0000000000000000]

 OS: Linux
 Arch: x86-64

 Backtrace:
  [0x0000003E7CA32625] gsignal + 53 (/lib64/libc.so.6)
  [0x0000003E7CA33E05] abort + 373 (/lib64/libc.so.6)
  [0x0000003E7CA2B74E] ? (/lib64/libc.so.6)
  [0x0000003E7CA2B810] __assert_perror_fail + 0 (/lib64/libc.so.6)
  [0x000000000099145A] ? (splunkd)
  [0x000000000098D582] _ZNK11TailWatcher12setupConfigsER15WatchedTailFile + 1474 (splunkd)
  [0x000000000098D692] _ZNK11TailWatcher19initializeFileStateER15WatchedTailFileRK8Pathname + 66 (splunkd)
  [0x00000000009904B2] _ZN11TailWatcher11fileChangedEP16WatchedFileStateRK7Timeval + 242 (splunkd)
  [0x0000000000EC2602] _ZN30FilesystemChangeInternalWorker15callFileChangedER7TimevalP16WatchedFileState + 114 (splunkd)
  [0x0000000000EC3F90] _ZN30FilesystemChangeInternalWorker12when_expiredERy + 464 (splunkd)
  [0x0000000000F53B2D] _ZN11TimeoutHeap18runExpiredTimeoutsER7Timeval + 301 (splunkd)
  [0x0000000000EBD818] _ZN9EventLoop3runEv + 744 (splunkd)
  [0x000000000098E9ED] _ZN11TailWatcher3runEv + 141 (splunkd)
  [0x000000000099428A] _ZN13TailingThread4mainEv + 154 (splunkd)
  [0x0000000000F5165E] _ZN6Thread8callMainEPv + 62 (splunkd)
  [0x0000003E7CE079D1] ? (/lib64/libpthread.so.0)
  [0x0000003E7CAE88FD] clone + 109 (/lib64/libc.so.6)
 Linux / rtlvpxawsb.labcorp.com / 2.6.32-504.16.2.el6.x86_64 / #1 SMP Tue Mar 10 17:01:00 EDT 2015 / x86_64
 Last few lines of stderr (may contain info on assertion failure, but also could be old):
    2015-10-14 16:39:28.077 -0400 splunkd started (build 271043)
    Conf mutator lockfile has disappeared; error condition possible.
    2015-10-15 15:07:01.431 -0400 splunkd started (build 271043)
    Conf mutator lockfile has disappeared; error condition possible.
    2015-10-15 16:25:07.121 -0400 splunkd started (build 271043)
    Conf mutator lockfile has disappeared; error condition possible.
    2015-10-29 14:25:14.432 -0400 splunkd started (build 271043)
    splunkd: /home/build/build-src/6.2.4/src/pipeline/input/Tailing.h:120: bool StatWrap::isDir() const: Assertion `_valid' failed.
    2015-12-15 08:16:03.012 -0500 splunkd started (build 271043)
    splunkd: /home/build/build-src/6.2.4/src/pipeline/input/Tailing.h:120: bool StatWrap::isDir() const: Assertion `_valid' failed.

 /etc/redhat-release: Red Hat Enterprise Linux Server release 6.6 (Santiago)
 glibc version: 2.12
 glibc release: stable
Last errno: 2
Threads running: 30
argv: [splunkd -p 8089 start]
Thread: "MainTailingThread", did_join=0, ready_to_run=Y, main_thread=N
First 8 bytes of Thread token @0x7fd2b1c6d150:
00000000  00 a7 5f aa d2 7f 00 00                           |.._.....|
00000008

First 512 bytes of Timeout object @0x7fd2aa5f9a88:
00000000  10 f2 6d 01 00 00 00 00  00 00 00 00 00 00 00 00  |..m.............|
00000010  38 98 5f aa d2 7f 00 00  00 00 00 00 00 00 00 00  |8._.............|
00000020  00 00 00 00 00 00 00 00  29 97 ad 56 00 00 00 00  |........)..V....|
00000030  3e 94 0d 00 00 00 00 00  00 00 00 00 00 00 00 00  |>...............|
00000040  80 9a 5f aa d2 7f 00 00  20 9c 5f aa d2 7f 00 00  |.._..... ._.....|
00000050  01 00 00 00 01 00 00 00  c0 a7 14 a9 d2 7f 00 00  |................|
00000060  80 d6 1c b0 d2 7f 00 00  c0 28 10 a9 d2 7f 00 00  |.........(......|
00000070  00 43 8a a9 d2 7f 00 00  00 9b 5f aa d2 7f 00 00  |.C........_.....|
00000080  00 9b 5f aa d2 7f 00 00  10 9b 5f aa d2 7f 00 00  |.._......._.....|
00000090  10 9b 5f aa d2 7f 00 00  20 9b 5f aa d2 7f 00 00  |.._..... ._.....|
000000a0  20 9b 5f aa d2 7f 00 00  c0 41 8a a9 d2 7f 00 00  | ._......A......|
000000b0  40 3f 8a a9 d2 7f 00 00  00 00 00 00 00 00 00 00  |@?..............|
000000c0  00 e0 0d ab d2 7f 00 00  14 10 00 00 00 00 00 00  |................|
000000d0  51 f3 4b 55 00 00 00 00  78 49 61 01 00 00 00 00  |Q.KU....xIa.....|
000000e0  d8 f6 16 b0 d2 7f 00 00  00 00 00 00 00 00 00 00  |................|
000000f0  00 00 00 00 00 00 00 00  80 4b 1f ab d2 7f 00 00  |.........K......|
00000100  10 4c 1f ab d2 7f 00 00  c0 4d 1f ab d2 7f 00 00  |.L.......M......|
00000110  0c 00 00 00 00 00 00 00  00 e4 1e b0 d2 7f 00 00  |................|
00000120  64 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |d...............|
00000130  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000140  b8 9b 5f aa d2 7f 00 00  b8 9b 5f aa d2 7f 00 00  |.._......._.....|
00000150  00 00 00 00 00 00 00 00  80 63 1a b0 d2 7f 00 00  |.........c......|
00000160  e0 8d 82 af d2 7f 00 00  00 00 00 00 00 00 00 00  |................|
00000170  18 2e 45 b2 d2 7f 00 00  00 e4 1e b0 d2 7f 00 00  |..E.............|
00000180  00 48 0c ab d2 7f 00 00  30 48 0c ab d2 7f 00 00  |.H......0H......|
00000190  40 48 0c ab d2 7f 00 00  26 00 00 00 10 00 00 00  |@H......&.......|
000001a0  80 a7 14 a9 d2 7f 00 00  00 00 00 00 00 00 00 00  |................|
000001b0  88 9a 5f aa d2 7f 00 00  00 00 00 00 00 00 00 00  |.._.............|
000001c0  25 00 00 00 d2 7f 00 00  40 6b 2e b0 d2 7f 00 00  |%.......@k......|
000001d0  10 00 00 00 aa aa aa aa  00 00 00 00 00 00 00 00  |................|
000001e0  00 00 00 00 d2 7f 00 00  60 99 5f aa d2 7f 00 00  |........`._.....|
000001f0  00 a7 5f aa d2 7f 00 00  00 00 00 00 00 00 00 00  |.._.............|
00000200
FilesystemChangeWatcher: _timeoutActive=Y, _throttled=N, _waitingForNotifyCount=1
  EMPTY Q: waitingForTimeout=N, noAction=N, stat=Y, immediateStat=Y, readdir=Y, notify=N

WatchedTailFile-WatchedFileState: path="/etc/httpd/logs/stage-phoenix.labcorp.com-ssl-request.log.5", flags=0x24023
First 144 bytes of PathnameStat @0x7fd2a98a4348:
00000000  30 2c 20 73 6f 75 72 63  65 50 6f 72 74 3d 38 30  |0, sourcePort=80|
00000010  38 39 2c 20 64 65 73 74  49 70 3d 31 30 2e 31 31  |89, destIp=10.11|
00000020  31 2e 31 2e 31 39 35 2c  20 64 65 73 74 50 6f 72  |1.1.195, destPor|
00000030  74 3d 39 39 39 37 2c 20  5f 74 63 70 5f 42 70 73  |t=9997, _tcp_Bps|
00000040  3d 32 34 31 39 2e 34 37  2c 20 5f 74 63 70 5f 4b  |=2419.47, _tcp_K|
00000050  42 70 73 3d 32 2e 33 36  2c 20 5f 74 63 70 5f 61  |Bps=2.36, _tcp_a|
00000060  76 67 5f 74 68 72 75 70  75 74 3d 32 2e 33 36 2c  |vg_thruput=2.36,|
00000070  20 5f 74 63 70 5f 4b 70  72 6f 63 65 73 73 65 64  | _tcp_Kprocessed|
00000080  3d 37 31 2c 20 5f 74 63  70 5f 65 70 73 3d 31 2e  |=71, _tcp_eps=1.|
00000090
FilesystemChangeWatcher: _timeoutActive=Y, _throttled=N, _waitingForNotifyCount=1
  EMPTY Q: waitingForTimeout=N, noAction=N, stat=Y, immediateStat=Y, readdir=Y, notify=N
  Timeout: _when = 2321382613982983482.5641075399597568045, _initialMsec = 8247328199548096326
file-in: _initialized=Y, _lastCharWasNewline=N, _lastReadHadNulls=N, _wasCrcConflict=N, _warned=N
         _nullsWarned=N, _wasTooNew=N, _exists=N, _noDebug=N
         _hadExplicitSource=N, _crossedInitCrcLenBoundary=N, _classifiedAtLeastOnce=N, _fileReplaced=N, _readPathAfterRealEOF=N
         _onlyNotifiedOnce=Y, _isArchive=N, _isCached=111213, _unowned=N, _deleteOnEOF=N
         _overrideDeleteOnEOF=N, _doNotDeleteChildren=N, _readFromEnd=N, _readIrregardless=N
         _fileCheckMethod=0, _crcSalt=<null>, _origPath=<null>
         _bytesRead=0, _storingBytesRead=0, _initCrc=0x0, _seekCrc=0x0
         _filenameCrc=0x16ab246dab3357c1, _fallbackCrc=0x0, _lastEOFTime=<zero>, _modTime=<zero>
         _eofSeconds=3, _ignoreThresh=<zero>, _initCrcBytes=256, _initCrcForBatch=0x0
         _pendingMetadata=<null>
         _prevFd=-1, _pdModels=[0 PDs]
         _rescheduleDelay=1000, _rescheduleTarget=<zero>, _name=/etc/httpd/logs/stage-phoenix.labcorp.com-ssl-request.log.5, _statusName=
         _st=[dev=64773, ino=36, mode=100644, size=7204328, mtime=1453796447, owner=0, group=3000]
         _toStringPrefix=state=0x0x7fd2a98a42c0, _backoff=0
         _stdataInputHeaderProcessing=[]

         _detectTrailingNulls=N, _detectReadingFromOffSet=N, _readAndSkipHeader=N, _uniqueId=439908
  _rawPath=


x86 CPUID registers:
         0: 0000000D 756E6547 6C65746E 49656E69
         1: 000206D7 02010800 9E982203 0FABFBFF
         2: 76035A01 00F0B2FF 00000000 00CA0000
         3: 00000000 00000000 00000000 00000000
         4: 00000000 00000000 00000000 00000000
         5: 00000000 00000000 00000000 00000000
         6: 00000077 00000002 00000009 00000000
         7: 00000000 00000000 00000000 00000000
         8: 00000000 00000000 00000000 00000000
         9: 00000001 00000000 00000000 00000000
         A: 07300401 0000007F 00000000 00000000
         B: 00000000 00000000 000000FD 00000002
         C: 00000000 00000000 00000000 00000000
         😧 00000000 00000000 00000000 00000000
  80000000: 80000008 00000000 00000000 00000000
  80000001: 00000000 00000000 00000001 28100800
  80000002: 20202020 49202020 6C65746E 20295228
  80000003: 6E6F6558 20295228 20555043 342D3545
  80000004: 20303436 20402030 30342E32 007A4847
  80000005: 00000000 00000000 00000000 00000000
  80000006: 00000000 00000000 01006040 00000000
  80000007: 00000000 00000000 00000000 00000100
  80000008: 00003028 00000000 00000000 00000000
terminating...
0 Karma
1 Solution

cramasta
Builder

You might be hitting bug SPL-104017. 6.2.5 + has a fix for this, I would suggest upgrading if possible.

Here is some other people talking about the same issue.
https://answers.splunk.com/answers/290645/why-is-our-splunk-624-forwarder-on-linux-crashing.html

View solution in original post

ricarrol
New Member

I have a very similar problem.. the crash occurs when we move one day old apache log files out of the active folder to an archive folder. the log file identified in the crash file is one of the no-longer active apache logs.

0 Karma

cramasta
Builder

You might be hitting bug SPL-104017. 6.2.5 + has a fix for this, I would suggest upgrading if possible.

Here is some other people talking about the same issue.
https://answers.splunk.com/answers/290645/why-is-our-splunk-624-forwarder-on-linux-crashing.html

Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...