All Apps and Add-ons

I have 3 indexers, why does the Master Node crash after one of them restarts?

emzet
Explorer

Hello, 

I have 3 indexers. After one of them was restarted then Master Node crash and create crash log every minutes (when indexer try connect to cluster)

Below crash log:

 

[build cd0848707637] 2022-03-29 17:48:34
Received fatal signal 6 (Aborted) on PID 3183981.
 Cause:
   Signal sent by PID 3183981 running under UID 1004.
 Crashing thread: CMAddPeerWorker-5
 Registers:
    RIP:  [0x00007FDB3792137F] gsignal + 271 (libc.so.6 + 0x3737F)
    RDI:  [0x0000000000000002]
    RSI:  [0x00007FDB121F9860]
    RBP:  [0x00007FDB37A74698]
    RSP:  [0x00007FDB121F9860]
    RAX:  [0x0000000000000000]
    RBX:  [0x0000000000000006]
    RCX:  [0x00007FDB3792137F]
    RDX:  [0x0000000000000000]
    R8:  [0x0000000000000000]
    R9:  [0x00007FDB121F9860]
    R10:  [0x0000000000000008]
    R11:  [0x0000000000000246]
    R12:  [0x0000555F4AA9B818]
    R13:  [0x0000555F4A93BC02]
    R14:  [0x00000000000003C2]
    R15:  [0x00007FDB16506238]
    EFL:  [0x0000000000000246]
    TRAPNO:  [0x0000000000000000]
    ERR:  [0x0000000000000000]
    CSGSFS:  [0x002B000000000033]
    OLDMASK:  [0x0000000000000000]

 OS: Linux
 Arch: x86-64

 Backtrace (PIC build):
  [0x00007FDB3792137F] gsignal + 271 (libc.so.6 + 0x3737F)
  [0x00007FDB3790BDB5] abort + 295 (libc.so.6 + 0x21DB5)
  [0x00007FDB3790BC89] ? (libc.so.6 + 0x21C89)
  [0x00007FDB37919A76] ? (libc.so.6 + 0x2FA76)
  [0x0000555F497B294F] _ZN8CMBucket14setRASummariesERK4GuidRKSt3mapI3Str15CMBucketSummarySt4lessIS4_ESaISt4pairIKS4_S5_EEE + 623 (splunkd + 0x28C694F)
  [0x0000555F496C13C8] _ZN15CMAddPeerWorker15finishAddBucketERP8CMBucketR15BucketCSVStruct + 136 (splunkd + 0x27D53C8)
  [0x0000555F496C2320] _ZN15CMAddPeerWorker19addStandaloneBucketERK13IndexDataTypeR15BucketCSVStruct + 128 (splunkd + 0x27D6320)
  [0x0000555F496C24B3] _ZN15CMAddPeerWorker20processBucketBatchesEv + 291 (splunkd + 0x27D64B3)
  [0x0000555F48757588] _ZN15CMAddPeerWorker4mainEv + 552 (splunkd + 0x186B588)
  [0x0000555F4959B917] _ZN6Thread8callMainEPv + 135 (splunkd + 0x26AF917)
  [0x00007FDB37CB717A] ? (libpthread.so.0 + 0x817A)
  [0x00007FDB379E6DC3] clone + 67 (libc.so.6 + 0xFCDC3)
 Linux / splunk-master-prod-01.local.ad / 4.18.0-240.1.1.el8_3.x86_64 / #1 SMP Fri Oct 16 13:36:46 EDT 2020 / x86_64
 Libc abort message: splunkd: /opt/splunk/src/clustering/CMBucket.cpp:962: void CMBucket::setRASummaries(const Guid&, const CMBucketSummaries&): Assertion `hasPeer(peer)' failed.

 /etc/redhat-release: Red Hat Enterprise Linux release 8.5 (Ootpa)
 glibc version: 2.28
 glibc release: stable
Last errno: 0
Threads running: 103
Runtime: 56.398836s
argv: [splunkd --under-systemd --systemd-delegate=yes -p 8089 _internal_launch_under_systemd]
Regex JIT enabled

RE2 regex engine enabled

using CLOCK_MONOTONIC
Thread: "CMAddPeerWorker-5", did_join=0, ready_to_run=Y, main_thread=N, token=140578878629632
MutexByte: MutexByte-waiting={none}


x86 CPUID registers:
         0: 0000000D 756E6547 6C65746E 49656E69
         1: 000306F0 07040800 FFFA3203 1F8BFBFF
         2: 76036301 00F0B5FF 00000000 00C30000
         3: 00000000 00000000 00000000 00000000
         4: 00000000 00000000 00000000 00000000
         5: 00000000 00000000 00000000 00000000
         6: 00000004 00000000 00000000 00000000
         7: 00000000 00000000 00000000 00000000
         8: 00000000 00000000 00000000 00000000
         9: 00000000 00000000 00000000 00000000
         A: 07300401 000000FF 00000000 00000000
         B: 00000000 00000000 00000047 00000007
         C: 00000000 00000000 00000000 00000000
          00000000 00000000 00000000 00000000
  80000000: 80000008 00000000 00000000 00000000
  80000001: 00000000 00000000 00000021 2C100800
  80000002: 65746E49 2952286C 6F655820 2952286E
  80000003: 55504320 2D354520 30383632 20347620
  80000004: 2E322040 48473034 0000007A 00000000
  80000005: 00000000 00000000 00000000 00000000
  80000006: 00000000 00000000 01006040 00000000
  80000007: 00000000 00000000 00000000 00000100
  80000008: 0000302B 00000000 00000000 00000000
terminating...

 

And indexer-1 (that one that was rebooted) cannot join to cluster. 

Has anyone had this problem and how to deal with it?

If more info needed im able to send it.

0 Karma

spelunkingsplnk
Splunk Employee
Splunk Employee

Did you ever figure out this issue? I'm experiencing a very similar issue. 3 Indexers, restarted 1 of the indexers and the Master node crashed. I even got the same error message as you:

splunkd: /opt/splunk/src/clustering/CMBucket.cpp:962: void CMBucket::setRASummaries(const Guid&, const CMBucketSummaries&): Assertion 'hasPeer(peer)' failed.

0 Karma
Get Updates on the Splunk Community!

Introducing Ingest Actions: Filter, Mask, Route, Repeat

WATCH NOW Ingest Actions (IA) is the best new way to easily filter, mask and route your data in Splunk® ...

Splunk Forwarders and Forced Time Based Load Balancing

Splunk customers use universal forwarders to collect and send data to Splunk. A universal forwarder can send ...

NEW! Log Views in Splunk Observability Dashboards Gives Context From a Single Page

Today, Splunk Observability releases log views, a new feature for users to add their logs data from Splunk Log ...