I'm noticing that our indexers are crashing, and not coming back gracefully. I've looked in the logs, and keep seeing segfault errors. It really put extra strain on the system when 3-4 indexers go down all at once. I'm thinking it has something to do with the time, but I'm not sure yet.
Cause:
Unknown signal origin (si_code=128, si_addr=[0x0000000000000000]).
 Crashing thread: indexerPipe_1
 Registers:
    RIP:  [0x000055923E1027F0] _ZN14IndexProcessor18rollAllHotForIndexERK3StriS2_RKSt13unordered_mapIS0_6ObjRefI11IndexWriterE8hash_str6eq_strSaISt4pairIS1_S6_EEE + 592 (splunkd + 0xA6C7F0)
    RDI:  [0xFFFFFFFFFFFFFFF7]
    RSI:  [0x0000000000000004]
    RBP:  [0x0000000000000001]
    RSP:  [0x00007F47507FEA30]
    RAX:  [0x0000000000000000]
    RBX:  [0x0E00000001000000]
    RCX:  [0x0000000000000000]
    RDX:  [0x0000000000000400]
    R8:  [0x000055923F4E6449]
    R9:  [0x00007F475AC9D130]
    R10:  [0x00007F476C9B1D50]
    R11:  [0x00007F476B000080]
    R12:  [0x00007F472B7D0670]
    R13:  [0x00007F472B7D05D0]
    R14:  [0x0000000000000000]
    R15:  [0x00007F472B7D06C0]
    EFL:  [0x0000000000010246]
    TRAPNO:  [0x000000000000000D]
    ERR:  [0x0000000000000000]
    CSGSFS:  [0x0000000000000033]
    OLDMASK:  [0x0000000000000000]
 OS: Linux
 Arch: x86-64
using CLOCK_MONOTONIC
Thread: "indexerPipe_1", did_join=0, ready_to_run=Y, main_thread=N
First 8 bytes of Thread token @0x7f475082a010:
00000000  00 f7 7f 50 47 7f 00 00                           |...PG...|
00000008
x86 CPUID registers:
         0: 0000000F 756E6547 6C65746E 49656E69
         1: 000306F2 00100800 7FFEFBFF BFEBFBFF
         2: 76036301 00F0B5FF 00000000 00C10000
         3: 00000000 00000000 00000000 00000000
         4: 00000000 00000000 00000000 00000000
         5: 00000040 00000040 00000003 00002120
         6: 00000077 00000002 00000009 00000000
         7: 00000000 00000000 00000000 00000000
         8: 00000000 00000000 00000000 00000000
         9: 00000001 00000000 00000000 00000000
         A: 07300403 00000000 00000000 00000603
         B: 00000000 00000000 000000AD 00000000
         C: 00000000 00000000 00000000 00000000
         D: 00000000 00000000 00000000 00000000
         E: 00000000 00000000 00000000 00000000
         F: 00000000 00000000 00000000 00000000
  80000000: 80000008 00000000 00000000 00000000
  80000001: 00000000 00000000 00000021 2C100800
  80000002: 65746E49 2952286C 6F655820 2952286E
  80000003: 55504320 2D354520 30343632 20337620
  80000004: 2E322040 48473036 0000007A 00000000
  80000005: 00000000 00000000 00000000 00000000
  80000006: 00000000 00000000 01006040 00000000
  80000007: 00000000 00000000 00000000 00000100
  80000008: 0000302E 00000000 00000000 00000000
terminating...
I've seen this in a couple of environments, so I don't think it's a unique problem.
 
					
				
		
If it's not a bug then it must be a bug
 
					
				
		
Hi,
did you perform any kind of update lately, Splunk Enterprise, NFS, OS?
Seen behavior like this in all 3 cases. Maybe it helps to identify when this started.
How can a customer review the Bug information? SPL-148969
 
		
		
		
		
		
	
			
		
		
			
					
		Submit a support case, they can tell you.
We had exactly the same problem and it was started after upgrade to version 7.0.2.
After opening a case in Splunk, they instructed us to upgrade to version 7.0.3 or higher because it's a bug that was fixed in "SPL-148969, SPL-148600 Indexer may crash during hot bucket rolling following a streaming failure".
Hope this helps you.
