All Apps and Add-ons

I have 3 indexers, why does the Master Node crash after one of them restarts?

emzet
Explorer

Hello, 

I have 3 indexers. After one of them was restarted then Master Node crash and create crash log every minutes (when indexer try connect to cluster)

Below crash log:

 

[build cd0848707637] 2022-03-29 17:48:34
Received fatal signal 6 (Aborted) on PID 3183981.
 Cause:
   Signal sent by PID 3183981 running under UID 1004.
 Crashing thread: CMAddPeerWorker-5
 Registers:
    RIP:  [0x00007FDB3792137F] gsignal + 271 (libc.so.6 + 0x3737F)
    RDI:  [0x0000000000000002]
    RSI:  [0x00007FDB121F9860]
    RBP:  [0x00007FDB37A74698]
    RSP:  [0x00007FDB121F9860]
    RAX:  [0x0000000000000000]
    RBX:  [0x0000000000000006]
    RCX:  [0x00007FDB3792137F]
    RDX:  [0x0000000000000000]
    R8:  [0x0000000000000000]
    R9:  [0x00007FDB121F9860]
    R10:  [0x0000000000000008]
    R11:  [0x0000000000000246]
    R12:  [0x0000555F4AA9B818]
    R13:  [0x0000555F4A93BC02]
    R14:  [0x00000000000003C2]
    R15:  [0x00007FDB16506238]
    EFL:  [0x0000000000000246]
    TRAPNO:  [0x0000000000000000]
    ERR:  [0x0000000000000000]
    CSGSFS:  [0x002B000000000033]
    OLDMASK:  [0x0000000000000000]

 OS: Linux
 Arch: x86-64

 Backtrace (PIC build):
  [0x00007FDB3792137F] gsignal + 271 (libc.so.6 + 0x3737F)
  [0x00007FDB3790BDB5] abort + 295 (libc.so.6 + 0x21DB5)
  [0x00007FDB3790BC89] ? (libc.so.6 + 0x21C89)
  [0x00007FDB37919A76] ? (libc.so.6 + 0x2FA76)
  [0x0000555F497B294F] _ZN8CMBucket14setRASummariesERK4GuidRKSt3mapI3Str15CMBucketSummarySt4lessIS4_ESaISt4pairIKS4_S5_EEE + 623 (splunkd + 0x28C694F)
  [0x0000555F496C13C8] _ZN15CMAddPeerWorker15finishAddBucketERP8CMBucketR15BucketCSVStruct + 136 (splunkd + 0x27D53C8)
  [0x0000555F496C2320] _ZN15CMAddPeerWorker19addStandaloneBucketERK13IndexDataTypeR15BucketCSVStruct + 128 (splunkd + 0x27D6320)
  [0x0000555F496C24B3] _ZN15CMAddPeerWorker20processBucketBatchesEv + 291 (splunkd + 0x27D64B3)
  [0x0000555F48757588] _ZN15CMAddPeerWorker4mainEv + 552 (splunkd + 0x186B588)
  [0x0000555F4959B917] _ZN6Thread8callMainEPv + 135 (splunkd + 0x26AF917)
  [0x00007FDB37CB717A] ? (libpthread.so.0 + 0x817A)
  [0x00007FDB379E6DC3] clone + 67 (libc.so.6 + 0xFCDC3)
 Linux / splunk-master-prod-01.local.ad / 4.18.0-240.1.1.el8_3.x86_64 / #1 SMP Fri Oct 16 13:36:46 EDT 2020 / x86_64
 Libc abort message: splunkd: /opt/splunk/src/clustering/CMBucket.cpp:962: void CMBucket::setRASummaries(const Guid&, const CMBucketSummaries&): Assertion `hasPeer(peer)' failed.

 /etc/redhat-release: Red Hat Enterprise Linux release 8.5 (Ootpa)
 glibc version: 2.28
 glibc release: stable
Last errno: 0
Threads running: 103
Runtime: 56.398836s
argv: [splunkd --under-systemd --systemd-delegate=yes -p 8089 _internal_launch_under_systemd]
Regex JIT enabled

RE2 regex engine enabled

using CLOCK_MONOTONIC
Thread: "CMAddPeerWorker-5", did_join=0, ready_to_run=Y, main_thread=N, token=140578878629632
MutexByte: MutexByte-waiting={none}


x86 CPUID registers:
         0: 0000000D 756E6547 6C65746E 49656E69
         1: 000306F0 07040800 FFFA3203 1F8BFBFF
         2: 76036301 00F0B5FF 00000000 00C30000
         3: 00000000 00000000 00000000 00000000
         4: 00000000 00000000 00000000 00000000
         5: 00000000 00000000 00000000 00000000
         6: 00000004 00000000 00000000 00000000
         7: 00000000 00000000 00000000 00000000
         8: 00000000 00000000 00000000 00000000
         9: 00000000 00000000 00000000 00000000
         A: 07300401 000000FF 00000000 00000000
         B: 00000000 00000000 00000047 00000007
         C: 00000000 00000000 00000000 00000000
          00000000 00000000 00000000 00000000
  80000000: 80000008 00000000 00000000 00000000
  80000001: 00000000 00000000 00000021 2C100800
  80000002: 65746E49 2952286C 6F655820 2952286E
  80000003: 55504320 2D354520 30383632 20347620
  80000004: 2E322040 48473034 0000007A 00000000
  80000005: 00000000 00000000 00000000 00000000
  80000006: 00000000 00000000 01006040 00000000
  80000007: 00000000 00000000 00000000 00000100
  80000008: 0000302B 00000000 00000000 00000000
terminating...

 

And indexer-1 (that one that was rebooted) cannot join to cluster. 

Has anyone had this problem and how to deal with it?

If more info needed im able to send it.

0 Karma

spelunkingsplnk
Splunk Employee
Splunk Employee

Did you ever figure out this issue? I'm experiencing a very similar issue. 3 Indexers, restarted 1 of the indexers and the Master node crashed. I even got the same error message as you:

splunkd: /opt/splunk/src/clustering/CMBucket.cpp:962: void CMBucket::setRASummaries(const Guid&, const CMBucketSummaries&): Assertion 'hasPeer(peer)' failed.

0 Karma
Get Updates on the Splunk Community!

Improve Your Security Posture

Watch NowImprove Your Security PostureCustomers are at the center of everything we do at Splunk and security ...

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...