Re: Indexers BatchAdding problem

m_zandinia · ‎01-09-2022

Hi Splunkers.

I have an indexer cluster and all of sudden all of them goes up and down and stuck in BatchAdding status.

I have 4 indexers.

These are my settings:

[clustering]
cluster_label = IndexerCluster
mode = master
rebalance_threshold = 0.95
replication_factor = 3
search_factor = 2
restart_timeout = 180
service_interval = 90
heartbeat_timeout = 180
cxn_timeout = 300
send_timeout = 300
rcv_timeout = 300
max_peer_build_load = 20
max_peer_rep_load = 50
max_fixup_time_ms = 0
maintenance_mode = false

I increase max_peer_build_load to improve my fixup tasks but it doesn't work.

I've followed the amount of buckets and it increases very slowly.

I have this error in my splund.log file on indexers

ERROR ProcessTracker - (child_581__Fsck) BucketBuilder - BucketBuilder::error: Event data size is 0. Raw and Meta data may be missing for bucket="/Splunk-Storage/HOT/eventlog-online-index/db_1641702441_1641656220_301"

WARN  ProcessTracker - (child_601__Fsck)  Fsck - Repair entire bucket, index=eventlog-online-index, tryWarmThenCold=1, bucket=/Splunk-Storage/HOT/eventlog-online-index/db_1641702441_1641656220_301, exists=1, localrc=3, failReason=(entire bucket) Rebuild for bkt='/Splunk-Storage/HOT/eventlog-online-index/db_1641702441_1641656220_301' failed: BucketBuilder::error: Event data size is 0. Raw and Meta data may be missing for bucket="/Splunk-Storage/HOT/eventlog-online-index/db_1641702441_1641656220_301"

On the other hand I face with crash.log file on my indexers continuously

Received fatal signal 8 (Floating point exception).
 Cause:
   Integer division by zero at address [0x0000557E03DBB1D9].
 Crashing thread: indexerPipe
 Registers:
    RIP:  [0x0000557E03DBB1D9] _ZN12HotDBManager19computeBucketMapKeyERK15CowPipelineData + 121 (splunkd + 0xEF91D9)
    RDI:  [0x00007F43D73836D0]
    RSI:  [0x00007F43ABDAA72D]
    RBP:  [0x00007F43C022EB40]
    RSP:  [0x00007F43C07FD5A0]
    RAX:  [0x07AC58C70206CAB3]
    RBX:  [0x07AC58C70206CAB3]
    RCX:  [0x0000000000000000]
    RDX:  [0x0000000000000000]
    R8:  [0x00000000000000B8]
    R9:  [0x00007F43C8F3E060]
    R10:  [0x00007F43D73867D0]
    R11:  [0x00007F43D6200080]
    R12:  [0x00007F43D7385E08]
    R13:  [0x00007F43C07FD5F0]
    R14:  [0x00007F43C02148E0]
    R15:  [0x00007F43B6C2B500]
    EFL:  [0x0000000000010246]
    TRAPNO:  [0x0000000000000000]
    ERR:  [0x0000000000000000]
    CSGSFS:  [0x002B000000000033]
    OLDMASK:  [0x0000000000000000]

 OS: Linux
 Arch: x86-64

 Backtrace (PIC build):
  [0x0000557E03DBB1D9] _ZN12HotDBManager19computeBucketMapKeyERK15CowPipelineData + 121 (splunkd + 0xEF91D9)
  [0x0000557E03DBCFDA] _ZN12HotDBManager15_suitableBucketERK15CowPipelineDatalRblR3Str + 410 (splunkd + 0xEFAFDA)
  [0x0000557E03DBF018] _ZN12HotDBManager10suitableDbERK15CowPipelineDatalRblR3Str + 24 (splunkd + 0xEFD018)
  [0x0000557E03E1AF53] _ZN11IndexWriter11_dbLazyLoadERK15CowPipelineDatall + 131 (splunkd + 0xF58F53)
  [0x0000557E03E1C054] _ZN11IndexWriter14write_internalER15CowPipelineDatalRP8DBBucketb + 308 (splunkd + 0xF5A054)
  [0x0000557E03E1C8D7] _ZN11IndexWriter10write_implER15CowPipelineDatalb + 103 (splunkd + 0xF5A8D7)
  [0x0000557E03E1CC43] _ZN11IndexWriter5writeER15CowPipelineDatal + 19 (splunkd + 0xF5AC43)
  [0x0000557E03E1404F] _ZN14IndexProcessor7executeER15CowPipelineData + 3951 (splunkd + 0xF5204F)
  [0x0000557E0433F585] _ZN9Processor20executeMultiLastStepER18PipelineDataVector + 101 (splunkd + 0x147D585)
  [0x0000557E03B2ABCA] _ZN8Pipeline4mainEv + 1418 (splunkd + 0xC68BCA)
  [0x0000557E048FD9D8] _ZN6Thread8callMainEPv + 120 (splunkd + 0x1A3B9D8)
  [0x00007F43D67D6609] ? (libpthread.so.0 + 0x2609)
  [0x00007F43D66FD263] clone + 67 (libc.so.6 + 0xFD263)
 Linux / indexer1-datacenter / 5.4.0-92-generic / #103-Ubuntu SMP Fri Nov 26 16:13:00 UTC 2021 / x86_64
 /etc/debian_version: bullseye/sid
Last errno: 2
Threads running: 72
Runtime: 8.643140s
argv: [splunkd --under-systemd --systemd-delegate=yes -p 8089 _internal_launch_under_systemd]
Regex JIT enabled

RE2 regex engine enabled

using CLOCK_MONOTONIC
Thread: "indexerPipe", did_join=0, ready_to_run=Y, main_thread=N
First 8 bytes of Thread token @0x7f43c2118e10:
00000000  00 e7 7f c0 43 7f 00 00                           |....C...|
00000008


x86 CPUID registers:
         0: 00000016 756E6547 6C65746E 49656E69
         1: 00050657 08400800 7FFEFBFF BFEBFBFF
         2: 76036301 00F0B5FF 00000000 00C30000
         3: 00000000 00000000 00000000 00000000
         4: 00000000 00000000 00000000 00000000
         5: 00000040 00000040 00000003 00002020
         6: 00000AF7 00000002 00000009 00000000
         7: 00000000 00000000 00000000 00000000
         8: 00000000 00000000 00000000 00000000
         9: 00000000 00000000 00000000 00000000
         A: 07300404 00000000 00000000 00000603
         B: 00000000 00000000 0000002F 00000008
         C: 00000000 00000000 00000000 00000000
          00000000 00000000 00000000 00000000
         E: 00000000 00000000 00000000 00000000
         F: 00000000 00000000 00000000 00000000
        10: 00000000 00000000 00000000 00000000
        11: 00000000 00000000 00000000 00000000
        12: 00000000 00000000 00000000 00000000
        13: 00000000 00000000 00000000 00000000
        14: 00000000 00000000 00000000 00000000
        15: 00000002 000000F0 00000000 00000000
        16: 00000BB8 00000FA0 00000064 00000000
  80000000: 80000008 00000000 00000000 00000000
  80000001: 00000000 00000000 00000121 2C100800
  80000002: 65746E49 2952286C 6F655820 2952286E
  80000003: 6C6F4720 32362064 20523834 20555043
  80000004: 2E332040 48473030 0000007A 00000000
  80000005: 00000000 00000000 00000000 00000000
  80000006: 00000000 00000000 01006040 00000000
  80000007: 00000000 00000000 00000000 00000100
  80000008: 0000302E 00000000 00000000 00000000
terminating...

My OS is Ubuntu server 20.04.

Any suggestion?

Can I bring up one indexer outside of my cluster to prevent log drop and after the cluster will be stable join it to cluster?

emzet · ‎12-15-2022

If you have indexer cluster bucket should have GUID on name like this:

db_1641702441_1641656220_301_C7FC9055-53C4-4411-99E8-98FF5BA9E5E3

GUID of indxer you can find in instance.cfg file.

isoutamo · ‎01-09-2022

As your environment didn’t work and seems to have continuous crash, I propose that you should do ticket to splunk support asap with urgency one or two. They can help you.

r. Ismo

Indexers BatchAdding problem

indexer

indexer clustering

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

ATTENTION: We’re Moving! (AGAIN!)

Deep Dive: Optimizing Telemetry Pipelines in Splunk Observability Cloud

Announcing Modern Navigation: A New Era of Splunk User Experience

Join the Conversation