Hello everyone,
currently our Indexers keep crashing randomly. We're only running Linux OS, within Splunk 9.0.2.
Any suggestions what the Crashing thread means and how to solve that?
Thank you.
Received fatal signal 6 (Aborted) on PID 235655.
Cause:
Signal sent by PID 235655 running under UID 1018.
Crashing thread: FwdDataReceiverThread
Registers:
RIP: [0x00007F4A05C3E387] gsignal + 55 (libc.so.6 + 0x36387)
RDI: [0x0000000000039887]
RSI: [0x00000000000399C9]
RBP: [0x000000000000008F]
RSP: [0x00007F49E4FFE238]
RAX: [0x0000000000000000]
RBX: [0x000055B8710F5CA8]
RCX: [0xFFFFFFFFFFFFFFFF]
RDX: [0x0000000000000006]
R8: [0x00007F49E4FFF700]
R9: [0x00007F4A05C552CD]
R10: [0x0000000000000008]
R11: [0x0000000000000206]
R12: [0x000055B870FE5A93]
R13: [0x000055B8710F5D88]
R14: [0x000055B872226488]
R15: [0x00007F49E4FFE4E0]
EFL: [0x0000000000000206]
TRAPNO: [0x0000000000000000]
ERR: [0x0000000000000000]
CSGSFS: [0x0000000000000033]
OLDMASK: [0x0000000000000000]
OS: Linux
Arch: x86-64
Backtrace (PIC build):
[0x00007F4A05C3E387] gsignal + 55 (libc.so.6 + 0x36387)
[0x00007F4A05C3FA78] abort + 328 (libc.so.6 + 0x37A78)
[0x000055B86E1D4D26] ? (splunkd + 0x1A08D26)
[0x000055B86EE39BD2] _ZN26HealthDistIngestionLatency29calculateAndUpdateHealthColorEv + 914 (splunkd + 0x266DBD2)
[0x000055B86E744627] _ZN22TcpInPipelineProcessor7processER15CowPipelineData + 199 (splunkd + 0x1F78627)
[0x000055B86E74CD57] _ZN14FwdDataChannel16s2sDataAvailableER15CowPipelineDataRK15S2SPerEventInfom + 167 (splunkd + 0x1F80D57)
[0x000055B86F2B2255] _ZN11S2SReceiver11finishEventEv + 261 (splunkd + 0x2AE6255)
[0x000055B86F059E48] _ZN18StreamingS2SParser5parseEPKcS1_ + 6520 (splunkd + 0x288DE48)
[0x000055B86E73E004] _ZN16CookedTcpChannel7consumeER18TcpAsyncDataBuffer + 244 (splunkd + 0x1F72004)
[0x000055B86E74055D] _ZN16CookedTcpChannel13dataAvailableER18TcpAsyncDataBuffer + 45 (splunkd + 0x1F7455D)
[0x000055B86F592D03] _ZN10TcpChannel11when_eventsE18PollableDescriptor + 531 (splunkd + 0x2DC6D03)
[0x000055B86F4D5BCC] _ZN8PolledFd8do_eventEv + 124 (splunkd + 0x2D09BCC)
[0x000055B86F4D6B39] _ZN9EventLoop3runEv + 617 (splunkd + 0x2D0AB39)
[0x000055B86F58D68C] _ZN19Base_TcpChannelLoop7_do_runEv + 28 (splunkd + 0x2DC168C)
[0x000055B86F58D78E] _ZN25SubordinateTcpChannelLoop3runEv + 222 (splunkd + 0x2DC178E)
[0x000055B86F59A16D] _ZN6Thread37_callMainAndDiscardTerminateExceptionEv + 13 (splunkd + 0x2DCE16D)
[0x000055B86F59B062] _ZN6Thread8callMainEPv + 178 (splunkd + 0x2DCF062)
Hi @brennson90 ,
ok, the main checks are resources, THP and ulimit.
I see that ulimit seems to be low, to be sure see at https://docs.splunk.com/Documentation/Splunk/9.0.2/Installation/Systemrequirements#Considerations_re...
To be more sure, please run the Monitoring Console's Healthcheck.
Then open a Case to Splunk Support.
Ciao.
Giuseppe
Hi @brennson90,
what reference hardware are you using?
did you disabled THP and correctly configured ulimit?
If you're using a correct hardware resources and you disabled THP, open a case to Splunk Support.
Ciao.
Giuseppe
Hi @gcusello
THP is disabled.
This is our ulimit
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256854
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Our machines habe 24 CPU cores & 64GB RAM
Hi @brennson90 ,
ok, the main checks are resources, THP and ulimit.
I see that ulimit seems to be low, to be sure see at https://docs.splunk.com/Documentation/Splunk/9.0.2/Installation/Systemrequirements#Considerations_re...
To be more sure, please run the Monitoring Console's Healthcheck.
Then open a Case to Splunk Support.
Ciao.
Giuseppe
Hi @gcusello
thanks for your help we'll check all the parameters and contact Splunk Support if needed.
Thank you.