Re: Why do my Indexers, on Linux, keep segfaulting...

klopez30 · ‎02-13-2018

I'm noticing that our indexers are crashing, and not coming back gracefully. I've looked in the logs, and keep seeing segfault errors. It really put extra strain on the system when 3-4 indexers go down all at once. I'm thinking it has something to do with the time, but I'm not sure yet.

Cause:

Unknown signal origin (si_code=128, si_addr=[0x0000000000000000]).
 Crashing thread: indexerPipe_1
 Registers:
    RIP:  [0x000055923E1027F0] _ZN14IndexProcessor18rollAllHotForIndexERK3StriS2_RKSt13unordered_mapIS0_6ObjRefI11IndexWriterE8hash_str6eq_strSaISt4pairIS1_S6_EEE + 592 (splunkd + 0xA6C7F0)
    RDI:  [0xFFFFFFFFFFFFFFF7]
    RSI:  [0x0000000000000004]
    RBP:  [0x0000000000000001]
    RSP:  [0x00007F47507FEA30]
    RAX:  [0x0000000000000000]
    RBX:  [0x0E00000001000000]
    RCX:  [0x0000000000000000]
    RDX:  [0x0000000000000400]
    R8:  [0x000055923F4E6449]
    R9:  [0x00007F475AC9D130]
    R10:  [0x00007F476C9B1D50]
    R11:  [0x00007F476B000080]
    R12:  [0x00007F472B7D0670]
    R13:  [0x00007F472B7D05D0]
    R14:  [0x0000000000000000]
    R15:  [0x00007F472B7D06C0]
    EFL:  [0x0000000000010246]
    TRAPNO:  [0x000000000000000D]
    ERR:  [0x0000000000000000]
    CSGSFS:  [0x0000000000000033]
    OLDMASK:  [0x0000000000000000]

 OS: Linux
 Arch: x86-64

using CLOCK_MONOTONIC
Thread: "indexerPipe_1", did_join=0, ready_to_run=Y, main_thread=N
First 8 bytes of Thread token @0x7f475082a010:
00000000  00 f7 7f 50 47 7f 00 00                           |...PG...|
00000008


x86 CPUID registers:
         0: 0000000F 756E6547 6C65746E 49656E69
         1: 000306F2 00100800 7FFEFBFF BFEBFBFF
         2: 76036301 00F0B5FF 00000000 00C10000
         3: 00000000 00000000 00000000 00000000
         4: 00000000 00000000 00000000 00000000
         5: 00000040 00000040 00000003 00002120
         6: 00000077 00000002 00000009 00000000
         7: 00000000 00000000 00000000 00000000
         8: 00000000 00000000 00000000 00000000
         9: 00000001 00000000 00000000 00000000
         A: 07300403 00000000 00000000 00000603
         B: 00000000 00000000 000000AD 00000000
         C: 00000000 00000000 00000000 00000000
         D: 00000000 00000000 00000000 00000000
         E: 00000000 00000000 00000000 00000000
         F: 00000000 00000000 00000000 00000000
  80000000: 80000008 00000000 00000000 00000000
  80000001: 00000000 00000000 00000021 2C100800
  80000002: 65746E49 2952286C 6F655820 2952286E
  80000003: 55504320 2D354520 30343632 20337620
  80000004: 2E322040 48473036 0000007A 00000000
  80000005: 00000000 00000000 00000000 00000000
  80000006: 00000000 00000000 01006040 00000000
  80000007: 00000000 00000000 00000000 00000100
  80000008: 0000302E 00000000 00000000 00000000
terminating...

I've seen this in a couple of environments, so I don't think it's a unique problem.

DavidHourani · ‎12-07-2018

If it's not a bug then it must be a bug

dkeck · ‎12-07-2018

Hi,

did you perform any kind of update lately, Splunk Enterprise, NFS, OS?

Seen behavior like this in all 3 cases. Maybe it helps to identify when this started.

optum · ‎12-07-2018

How can a customer review the Bug information? SPL-148969

martin_mueller · ‎12-11-2018

Submit a support case, they can tell you.

marcia01 · ‎06-12-2018

We had exactly the same problem and it was started after upgrade to version 7.0.2.
After opening a case in Splunk, they instructed us to upgrade to version 7.0.3 or higher because it's a bug that was fixed in "SPL-148969, SPL-148600 Indexer may crash during hot bucket rolling following a streaming failure".

Hope this helps you.

Why do my Indexers, on Linux, keep segfaulting randomly?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Announcing Modern Navigation: A New Era of Splunk User Experience

Observability Simplified: Combining User Experience, Application Performance & ...

Event Series May & June: From Network Visibility to Service Intelligence

Join the Conversation