Splunk Enterprise

How can I stop indexer from crashing frequently?

MohammedTaher
Engager

Hello Splunkers

Im facing an issue with my indexer its crashing every 1-2 hours & sometimes suddenly  crashes after 10 minutes of restarting. 

Indexer specs : 
CentOS Linux 7
24 CPU RAM 
1T SSD 

Splunk Version : Splunk 8.2.1 (build ddff1c41e5cf)

 

Crash logs : 

Received fatal signal 6 (Aborted) on PID 19932.
Cause:
Signal sent by PID 19932 running under UID 1000.
Crashing thread: tailreader0
Registers:
RIP: [0x00002B87E7282277] gsignal + 55 (libc.so.6 + 0x36277)
RDI: [0x0000000000004DDC]
RSI: [0x0000000000004EF5]
RBP: [0x00002B87E73D6580]
RSP: [0x00002B880E3FF608]
RAX: [0x0000000000000000]
RBX: [0x00002B87E5F9C000]
RCX: [0xFFFFFFFFFFFFFFFF]
RDX: [0x0000000000000006]
R8: [0x0000000000000090]
R9: [0x00002B87E7800080]
R10: [0x0000000000000008]
R11: [0x0000000000000202]
R12: [0x000055EFC1BE60C8]
R13: [0x000055EFC1BE6098]
R14: [0x00002B87E5EAD8C8]
R15: [0x00002B880E88E930]
EFL: [0x0000000000000202]
TRAPNO: [0x0000000000000000]
ERR: [0x0000000000000000]
CSGSFS: [0x0000000000000033]
OLDMASK: [0x0000000000000000]

OS: Linux
Arch: x86-64

Backtrace (PIC build):
[0x00002B87E7282277] gsignal + 55 (libc.so.6 + 0x36277)
[0x00002B87E7283968] abort + 328 (libc.so.6 + 0x37968)
[0x00002B87E727B096] ? (libc.so.6 + 0x2F096)
[0x00002B87E727B142] ? (libc.so.6 + 0x2F142)
[0x000055EFBF459410] ? (splunkd + 0x131A410)
[0x000055EFBFB3B102] _ZN3WTF23quickCheckForRolledFileERK8Pathname + 210 (splunkd + 0x19FC102)
[0x000055EFBFB3B947] _ZN3WTF13loadFishStateEP11PipelineSetb + 855 (splunkd + 0x19FC947)
[0x000055EFBFB300E8] _ZN10TailReader8readFileER15WatchedTailFile + 200 (splunkd + 0x19F10E8)
[0x000055EFBFB303A0] _ZN10TailReader4readEP15WatchedTailFileP11TailWatcher + 208 (splunkd + 0x19F13A0)
[0x000055EFBFB30D32] _ZN10TailReader10handleFileEP15WatchedTailFileP11TailWatcher + 514 (splunkd + 0x19F1D32)
[0x000055EFBF91F57A] _ZN12ReaderThread4mainEv + 746 (splunkd + 0x17E057A)
[0x000055EFC07F4C47] _ZN6Thread8callMainEPv + 135 (splunkd + 0x26B5C47)
[0x00002B87E7037E25] ? (libpthread.so.0 + 0x7E25)
[0x00002B87E734ABAD] clone + 109 (libc.so.6 + 0xFEBAD)
Linux / SRV-HO-SPLUNKIDX / 3.10.0-862.11.6.el7.x86_64 / #1 SMP Tue Aug 14 21:49:04 UTC 2018 / x86_64
Libc abort message: splunkd: /opt/splunk/src/pipeline/input/WatchedTailFile.cpp:249: void WTF::assertAndDump(bool, c
onst Str&) const: Assertion `0 && "See splunkd.log for crash reason."' failed.

/etc/redhat-release: CentOS Linux release 7.5.1804 (Core)
glibc version: 2.17
glibc release: stable
Last errno: 2
Threads running: 96
Runtime: 8926.636142s
argv: [splunkd -p 8089 start]
Regex JIT enabled

RE2 regex engine enabled

using CLOCK_MONOTONIC
Thread: "tailreader0", did_join=0, ready_to_run=Y, main_thread=N, token=47863354623744
MutexByte: MutexByte-waiting={none}
ReaderThread: mode=0, queueSize=14, shutdown=N, reconfigure=N, mode=0
Reading File-WatchedTailFile-WatchedFileState: path="/opt/splunk/var/log/introspection/resource_usage.log", flags=0x1
0000EB, alive
First 144 bytes of PathnameStat @0x2b880e890828:
00000000 00 fd 00 00 00 00 00 00 2d 96 0e 08 00 00 00 00 |........-.......|
00000010 01 00 00 00 00 00 00 00 80 81 00 00 e8 03 00 00 |................|
00000020 e8 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 66 48 00 00 00 00 00 00 00 10 00 00 00 00 00 00 |fH..............|
00000040 28 00 00 00 00 00 00 00 5a 49 52 64 00 00 00 00 |(.......ZIRd....|
00000050 9e 6a 2f 0b 00 00 00 00 5a 49 52 64 00 00 00 00 |.j/.....ZIRd....|
00000060 44 ae 3e 0b 00 00 00 00 5a 49 52 64 00 00 00 00 |D.>.....ZIRd....|
00000070 44 ae 3e 0b 00 00 00 00 00 00 00 00 00 00 00 00 |D.>.............|
00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000090
FilesystemChangeWatcher: _timeoutActive=N, _throttled=N, _waitingForNotifyCount=18
EMPTY Q: waitingForTimeout=N, noAction=N, stat=Y, immediateStat=Y, readdir=Y, notify=Y
USING INOTIFY: wds=6, score(0xFD00)=999, hasScaledTImeouts=Y
Timeout: _when = 511211.936945614, _initialInterval = 3.000
file-in: _initialized=Y, _lastCharWasNewline=Y, _lastReadHadNulls=N, _wasCrcConflict=N, _warned=N
_nullsWarned=N, _wasTooNew=N, _exists=Y, _noDebug=N
_hadExplicitSource=N, _crossedInitCrcLenBoundary=N, _classifiedAtLeastOnce=Y, _fileReplaced=Y, _readPathAfte
rRealEOF=Y
_onlyNotifiedOnce=N, _isArchive=N, _isCached=343536, _unowned=N, _deleteOnEOF=N
_overrideDeleteOnEOF=N, _doNotDeleteChildren=N, _readFromEnd=N, _readIrregardless=N
_fileCheckMethod=0, _crcSalt=<null>, _origPath=<null>
_bytesRead=25000259, _storingBytesRead=0, _initCrc=0x56aefe7f2a71345b, _seekCrc=0xa8957fe5632ae3b
_filenameCrc=0x55d3f47641cff9b5, _fallbackCrc=0x0, _lastEOFTime=1683114330.495657534948, _modTime=1683114330
.495656545355
_eofInterval=3.000, _ignoreThresh=0.000, _initCrcBytes=256, _initCrcForBatch=0x0
_pendingMetadata=<null>
_prevFd=331, _pdModels=[1 PD: [PD: flags=0x1540030, [_path] = "/opt/splunk/var/log/introspection/resource_us
age.log", [_MetaData:Index] = "_introspection", [MetaData:Source] = "source::/opt/splunk/var/log/introspection/resour
ce_usage.log", [MetaData:Host] = "host::SRV-HO-SPLUNKIDX", [MetaData:Sourcetype] = "sourcetype::splunk_resource_usage
", [_hpn] = "_hpn", [_charSet] = "UTF-8", [_conf] = "source::/opt/splunk/var/log/introspection/resource_usage.log|hos
t::SRV-HO-SPLUNKIDX|splunk_resource_usage|4982", [_channel] = "4982"]]
_rescheduleDelay=1.000, _rescheduleFresh=Y, _name=/opt/splunk/var/log/introspection/resource_usage.log, _sta
tusName=
_st=[dev=64768, ino=135173677, mode=100600, size=18534, mtime=1683114330, owner=1000, group=1000]
_toStringPrefix=state=0x0x2b880e890780, _backoff=0
_stdataInputHeaderProcessing=[]

_detectTrailingNulls=N, _detectReadingFromOffSet=Y, _readAndSkipHeader=N, _uniqueId=4982
_rawPath=$SPLUNK_HOME/var/log/introspection

 

x86 CPUID registers:
0: 00000014 756E6547 6C65746E 49656E69
1: 000406F1 1E010800 FFFA3203 0F8BFBFF
2: 76036301 00F0B5FF 00000000 00C30000
3: 00000000 00000000 00000000 00000000
4: 00000000 00000000 00000000 00000000
5: 00000000 00000000 00000000 00000000
6: 00000004 00000000 00000000 00000000
7: 00000000 00000000 00000000 00000000
8: 00000000 00000000 00000000 00000000
9: 00000000 00000000 00000000 00000000
A: 07300401 0000007F 00000000 00000000
B: 00000000 00000000 0000009D 0000001E
C: 00000000 00000000 00000000 00000000
😧 00000000 00000000 00000000 00000000
E: 00000000 00000000 00000000 00000000
F: 00000000 00000000 00000000 00000000
10: 00000000 00000000 00000000 00000000
11: 00000000 00000000 00000000 00000000
12: 00000000 00000000 00000000 00000000
13: 00000000 00000000 00000000 00000000
14: 00000000 00000000 00000000 00000000
80000000: 80000008 00000000 00000000 00000000
80000001: 00000000 00000000 00000121 2C100800
80000002: 65746E49 2952286C 6F655820 2952286E
80000003: 55504320 2D354520 30323632 20347620
80000004: 2E322040 48473031 0000007A 00000000
80000005: 00000000 00000000 00000000 00000000
80000006: 00000000 00000000 01006040 00000000
80000007: 00000000 00000000 00000000 00000100
80000008: 0000302B 00000000 00000000 00000000
terminating...

Correlate the crash with splunkd.log : 


05-03-2023 14:46:00.547 +0300 ERROR WatchedFile [20213 tailreader0] - About to assert due to: should have gotten back
a record from fishbucket: state=0x0x2b880e890780 wtf=0x0x2b880e88e800 off=25000259 initcrc=0x56aefe7f2a71345b scrc=0
xa8957fe5632ae3b fallbackcrc=0x0 last_eof_time=1683114330 reschedule_fresh=Y is_cached=343536 fd_valid=true exists=tr
ue last_char_newline=true on_block_boundary=false only_notified_once=false was_replaced=true eof_seconds=3 delay_done
key_until_close=false unowned=false always_read=false was_too_new=false name="/opt/splunk/var/log/introspection/resource_usage.log"

Labels (1)
Tags (2)
0 Karma
Get Updates on the Splunk Community!

Changes to Splunk Instructor-Led Training Completion Criteria

We’re excited to share an update to our instructor-led training program that enhances the learning experience ...

Stay Connected: Your Guide to January Tech Talks, Office Hours, and Webinars!

❄️ Welcome the new year with our January lineup of Community Office Hours, Tech Talks, and Webinars! &#x1f389; ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...