Hello Splunkers
Im facing an issue with my indexer its crashing every 1-2 hours & sometimes suddenly crashes after 10 minutes of restarting. Indexer specs : CentOS Linux 7 24 CPU RAM 1T SSD
Splunk Version : Splunk 8.2.1 (build ddff1c41e5cf)
Crash logs :
Received fatal signal 6 (Aborted) on PID 19932. Cause: Signal sent by PID 19932 running under UID 1000. Crashing thread: tailreader0 Registers: RIP: [0x00002B87E7282277] gsignal + 55 (libc.so.6 + 0x36277) RDI: [0x0000000000004DDC] RSI: [0x0000000000004EF5] RBP: [0x00002B87E73D6580] RSP: [0x00002B880E3FF608] RAX: [0x0000000000000000] RBX: [0x00002B87E5F9C000] RCX: [0xFFFFFFFFFFFFFFFF] RDX: [0x0000000000000006] R8: [0x0000000000000090] R9: [0x00002B87E7800080] R10: [0x0000000000000008] R11: [0x0000000000000202] R12: [0x000055EFC1BE60C8] R13: [0x000055EFC1BE6098] R14: [0x00002B87E5EAD8C8] R15: [0x00002B880E88E930] EFL: [0x0000000000000202] TRAPNO: [0x0000000000000000] ERR: [0x0000000000000000] CSGSFS: [0x0000000000000033] OLDMASK: [0x0000000000000000]
OS: Linux Arch: x86-64
Backtrace (PIC build): [0x00002B87E7282277] gsignal + 55 (libc.so.6 + 0x36277) [0x00002B87E7283968] abort + 328 (libc.so.6 + 0x37968) [0x00002B87E727B096] ? (libc.so.6 + 0x2F096) [0x00002B87E727B142] ? (libc.so.6 + 0x2F142) [0x000055EFBF459410] ? (splunkd + 0x131A410) [0x000055EFBFB3B102] _ZN3WTF23quickCheckForRolledFileERK8Pathname + 210 (splunkd + 0x19FC102) [0x000055EFBFB3B947] _ZN3WTF13loadFishStateEP11PipelineSetb + 855 (splunkd + 0x19FC947) [0x000055EFBFB300E8] _ZN10TailReader8readFileER15WatchedTailFile + 200 (splunkd + 0x19F10E8) [0x000055EFBFB303A0] _ZN10TailReader4readEP15WatchedTailFileP11TailWatcher + 208 (splunkd + 0x19F13A0) [0x000055EFBFB30D32] _ZN10TailReader10handleFileEP15WatchedTailFileP11TailWatcher + 514 (splunkd + 0x19F1D32) [0x000055EFBF91F57A] _ZN12ReaderThread4mainEv + 746 (splunkd + 0x17E057A) [0x000055EFC07F4C47] _ZN6Thread8callMainEPv + 135 (splunkd + 0x26B5C47) [0x00002B87E7037E25] ? (libpthread.so.0 + 0x7E25) [0x00002B87E734ABAD] clone + 109 (libc.so.6 + 0xFEBAD) Linux / SRV-HO-SPLUNKIDX / 3.10.0-862.11.6.el7.x86_64 / #1 SMP Tue Aug 14 21:49:04 UTC 2018 / x86_64 Libc abort message: splunkd: /opt/splunk/src/pipeline/input/WatchedTailFile.cpp:249: void WTF::assertAndDump(bool, c onst Str&) const: Assertion `0 && "See splunkd.log for crash reason."' failed.
/etc/redhat-release: CentOS Linux release 7.5.1804 (Core) glibc version: 2.17 glibc release: stable Last errno: 2 Threads running: 96 Runtime: 8926.636142s argv: [splunkd -p 8089 start] Regex JIT enabled
RE2 regex engine enabled
using CLOCK_MONOTONIC Thread: "tailreader0", did_join=0, ready_to_run=Y, main_thread=N, token=47863354623744 MutexByte: MutexByte-waiting={none} ReaderThread: mode=0, queueSize=14, shutdown=N, reconfigure=N, mode=0 Reading File-WatchedTailFile-WatchedFileState: path="/opt/splunk/var/log/introspection/resource_usage.log", flags=0x1 0000EB, alive First 144 bytes of PathnameStat @0x2b880e890828: 00000000 00 fd 00 00 00 00 00 00 2d 96 0e 08 00 00 00 00 |........-.......| 00000010 01 00 00 00 00 00 00 00 80 81 00 00 e8 03 00 00 |................| 00000020 e8 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000030 66 48 00 00 00 00 00 00 00 10 00 00 00 00 00 00 |fH..............| 00000040 28 00 00 00 00 00 00 00 5a 49 52 64 00 00 00 00 |(.......ZIRd....| 00000050 9e 6a 2f 0b 00 00 00 00 5a 49 52 64 00 00 00 00 |.j/.....ZIRd....| 00000060 44 ae 3e 0b 00 00 00 00 5a 49 52 64 00 00 00 00 |D.>.....ZIRd....| 00000070 44 ae 3e 0b 00 00 00 00 00 00 00 00 00 00 00 00 |D.>.............| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000090 FilesystemChangeWatcher: _timeoutActive=N, _throttled=N, _waitingForNotifyCount=18 EMPTY Q: waitingForTimeout=N, noAction=N, stat=Y, immediateStat=Y, readdir=Y, notify=Y USING INOTIFY: wds=6, score(0xFD00)=999, hasScaledTImeouts=Y Timeout: _when = 511211.936945614, _initialInterval = 3.000 file-in: _initialized=Y, _lastCharWasNewline=Y, _lastReadHadNulls=N, _wasCrcConflict=N, _warned=N _nullsWarned=N, _wasTooNew=N, _exists=Y, _noDebug=N _hadExplicitSource=N, _crossedInitCrcLenBoundary=N, _classifiedAtLeastOnce=Y, _fileReplaced=Y, _readPathAfte rRealEOF=Y _onlyNotifiedOnce=N, _isArchive=N, _isCached=343536, _unowned=N, _deleteOnEOF=N _overrideDeleteOnEOF=N, _doNotDeleteChildren=N, _readFromEnd=N, _readIrregardless=N _fileCheckMethod=0, _crcSalt=<null>, _origPath=<null> _bytesRead=25000259, _storingBytesRead=0, _initCrc=0x56aefe7f2a71345b, _seekCrc=0xa8957fe5632ae3b _filenameCrc=0x55d3f47641cff9b5, _fallbackCrc=0x0, _lastEOFTime=1683114330.495657534948, _modTime=1683114330 .495656545355 _eofInterval=3.000, _ignoreThresh=0.000, _initCrcBytes=256, _initCrcForBatch=0x0 _pendingMetadata=<null> _prevFd=331, _pdModels=[1 PD: [PD: flags=0x1540030, [_path] = "/opt/splunk/var/log/introspection/resource_us age.log", [_MetaData:Index] = "_introspection", [MetaData:Source] = "source::/opt/splunk/var/log/introspection/resour ce_usage.log", [MetaData:Host] = "host::SRV-HO-SPLUNKIDX", [MetaData:Sourcetype] = "sourcetype::splunk_resource_usage ", [_hpn] = "_hpn", [_charSet] = "UTF-8", [_conf] = "source::/opt/splunk/var/log/introspection/resource_usage.log|hos t::SRV-HO-SPLUNKIDX|splunk_resource_usage|4982", [_channel] = "4982"]] _rescheduleDelay=1.000, _rescheduleFresh=Y, _name=/opt/splunk/var/log/introspection/resource_usage.log, _sta tusName= _st=[dev=64768, ino=135173677, mode=100600, size=18534, mtime=1683114330, owner=1000, group=1000] _toStringPrefix=state=0x0x2b880e890780, _backoff=0 _stdataInputHeaderProcessing=[]
_detectTrailingNulls=N, _detectReadingFromOffSet=Y, _readAndSkipHeader=N, _uniqueId=4982 _rawPath=$SPLUNK_HOME/var/log/introspection
x86 CPUID registers: 0: 00000014 756E6547 6C65746E 49656E69 1: 000406F1 1E010800 FFFA3203 0F8BFBFF 2: 76036301 00F0B5FF 00000000 00C30000 3: 00000000 00000000 00000000 00000000 4: 00000000 00000000 00000000 00000000 5: 00000000 00000000 00000000 00000000 6: 00000004 00000000 00000000 00000000 7: 00000000 00000000 00000000 00000000 8: 00000000 00000000 00000000 00000000 9: 00000000 00000000 00000000 00000000 A: 07300401 0000007F 00000000 00000000 B: 00000000 00000000 0000009D 0000001E C: 00000000 00000000 00000000 00000000 😧 00000000 00000000 00000000 00000000 E: 00000000 00000000 00000000 00000000 F: 00000000 00000000 00000000 00000000 10: 00000000 00000000 00000000 00000000 11: 00000000 00000000 00000000 00000000 12: 00000000 00000000 00000000 00000000 13: 00000000 00000000 00000000 00000000 14: 00000000 00000000 00000000 00000000 80000000: 80000008 00000000 00000000 00000000 80000001: 00000000 00000000 00000121 2C100800 80000002: 65746E49 2952286C 6F655820 2952286E 80000003: 55504320 2D354520 30323632 20347620 80000004: 2E322040 48473031 0000007A 00000000 80000005: 00000000 00000000 00000000 00000000 80000006: 00000000 00000000 01006040 00000000 80000007: 00000000 00000000 00000000 00000100 80000008: 0000302B 00000000 00000000 00000000 terminating... Correlate the crash with splunkd.log :
05-03-2023 14:46:00.547 +0300 ERROR WatchedFile [20213 tailreader0] - About to assert due to: should have gotten back a record from fishbucket: state=0x0x2b880e890780 wtf=0x0x2b880e88e800 off=25000259 initcrc=0x56aefe7f2a71345b scrc=0 xa8957fe5632ae3b fallbackcrc=0x0 last_eof_time=1683114330 reschedule_fresh=Y is_cached=343536 fd_valid=true exists=tr ue last_char_newline=true on_block_boundary=false only_notified_once=false was_replaced=true eof_seconds=3 delay_done key_until_close=false unowned=false always_read=false was_too_new=false name="/opt/splunk/var/log/introspection/resource_usage.log"
... View more