Monitoring Splunk

Indexers BatchAdding problem

m_zandinia
Path Finder

Hi Splunkers.

I have an indexer cluster and all of sudden all of them goes up and down and stuck in BatchAdding status.

I have 4 indexers.

These are my settings:

 

[clustering]
cluster_label = IndexerCluster
mode = master
rebalance_threshold = 0.95
replication_factor = 3
search_factor = 2
restart_timeout = 180
service_interval = 90
heartbeat_timeout = 180
cxn_timeout = 300
send_timeout = 300
rcv_timeout = 300
max_peer_build_load = 20
max_peer_rep_load = 50
max_fixup_time_ms = 0
maintenance_mode = false

 

I increase max_peer_build_load  to improve my fixup tasks but it doesn't work.

I've followed the amount of buckets and it increases very slowly.

I have this error in my splund.log file on indexers

 

ERROR ProcessTracker - (child_581__Fsck) BucketBuilder - BucketBuilder::error: Event data size is 0. Raw and Meta data may be missing for bucket="/Splunk-Storage/HOT/eventlog-online-index/db_1641702441_1641656220_301"

 

 

WARN  ProcessTracker - (child_601__Fsck)  Fsck - Repair entire bucket, index=eventlog-online-index, tryWarmThenCold=1, bucket=/Splunk-Storage/HOT/eventlog-online-index/db_1641702441_1641656220_301, exists=1, localrc=3, failReason=(entire bucket) Rebuild for bkt='/Splunk-Storage/HOT/eventlog-online-index/db_1641702441_1641656220_301' failed: BucketBuilder::error: Event data size is 0. Raw and Meta data may be missing for bucket="/Splunk-Storage/HOT/eventlog-online-index/db_1641702441_1641656220_301"

 

On the other hand I face with crash.log file on my indexers continuously

 

Received fatal signal 8 (Floating point exception).
 Cause:
   Integer division by zero at address [0x0000557E03DBB1D9].
 Crashing thread: indexerPipe
 Registers:
    RIP:  [0x0000557E03DBB1D9] _ZN12HotDBManager19computeBucketMapKeyERK15CowPipelineData + 121 (splunkd + 0xEF91D9)
    RDI:  [0x00007F43D73836D0]
    RSI:  [0x00007F43ABDAA72D]
    RBP:  [0x00007F43C022EB40]
    RSP:  [0x00007F43C07FD5A0]
    RAX:  [0x07AC58C70206CAB3]
    RBX:  [0x07AC58C70206CAB3]
    RCX:  [0x0000000000000000]
    RDX:  [0x0000000000000000]
    R8:  [0x00000000000000B8]
    R9:  [0x00007F43C8F3E060]
    R10:  [0x00007F43D73867D0]
    R11:  [0x00007F43D6200080]
    R12:  [0x00007F43D7385E08]
    R13:  [0x00007F43C07FD5F0]
    R14:  [0x00007F43C02148E0]
    R15:  [0x00007F43B6C2B500]
    EFL:  [0x0000000000010246]
    TRAPNO:  [0x0000000000000000]
    ERR:  [0x0000000000000000]
    CSGSFS:  [0x002B000000000033]
    OLDMASK:  [0x0000000000000000]

 OS: Linux
 Arch: x86-64

 Backtrace (PIC build):
  [0x0000557E03DBB1D9] _ZN12HotDBManager19computeBucketMapKeyERK15CowPipelineData + 121 (splunkd + 0xEF91D9)
  [0x0000557E03DBCFDA] _ZN12HotDBManager15_suitableBucketERK15CowPipelineDatalRblR3Str + 410 (splunkd + 0xEFAFDA)
  [0x0000557E03DBF018] _ZN12HotDBManager10suitableDbERK15CowPipelineDatalRblR3Str + 24 (splunkd + 0xEFD018)
  [0x0000557E03E1AF53] _ZN11IndexWriter11_dbLazyLoadERK15CowPipelineDatall + 131 (splunkd + 0xF58F53)
  [0x0000557E03E1C054] _ZN11IndexWriter14write_internalER15CowPipelineDatalRP8DBBucketb + 308 (splunkd + 0xF5A054)
  [0x0000557E03E1C8D7] _ZN11IndexWriter10write_implER15CowPipelineDatalb + 103 (splunkd + 0xF5A8D7)
  [0x0000557E03E1CC43] _ZN11IndexWriter5writeER15CowPipelineDatal + 19 (splunkd + 0xF5AC43)
  [0x0000557E03E1404F] _ZN14IndexProcessor7executeER15CowPipelineData + 3951 (splunkd + 0xF5204F)
  [0x0000557E0433F585] _ZN9Processor20executeMultiLastStepER18PipelineDataVector + 101 (splunkd + 0x147D585)
  [0x0000557E03B2ABCA] _ZN8Pipeline4mainEv + 1418 (splunkd + 0xC68BCA)
  [0x0000557E048FD9D8] _ZN6Thread8callMainEPv + 120 (splunkd + 0x1A3B9D8)
  [0x00007F43D67D6609] ? (libpthread.so.0 + 0x2609)
  [0x00007F43D66FD263] clone + 67 (libc.so.6 + 0xFD263)
 Linux / indexer1-datacenter / 5.4.0-92-generic / #103-Ubuntu SMP Fri Nov 26 16:13:00 UTC 2021 / x86_64
 /etc/debian_version: bullseye/sid
Last errno: 2
Threads running: 72
Runtime: 8.643140s
argv: [splunkd --under-systemd --systemd-delegate=yes -p 8089 _internal_launch_under_systemd]
Regex JIT enabled

RE2 regex engine enabled

using CLOCK_MONOTONIC
Thread: "indexerPipe", did_join=0, ready_to_run=Y, main_thread=N
First 8 bytes of Thread token @0x7f43c2118e10:
00000000  00 e7 7f c0 43 7f 00 00                           |....C...|
00000008


x86 CPUID registers:
         0: 00000016 756E6547 6C65746E 49656E69
         1: 00050657 08400800 7FFEFBFF BFEBFBFF
         2: 76036301 00F0B5FF 00000000 00C30000
         3: 00000000 00000000 00000000 00000000
         4: 00000000 00000000 00000000 00000000
         5: 00000040 00000040 00000003 00002020
         6: 00000AF7 00000002 00000009 00000000
         7: 00000000 00000000 00000000 00000000
         8: 00000000 00000000 00000000 00000000
         9: 00000000 00000000 00000000 00000000
         A: 07300404 00000000 00000000 00000603
         B: 00000000 00000000 0000002F 00000008
         C: 00000000 00000000 00000000 00000000
          00000000 00000000 00000000 00000000
         E: 00000000 00000000 00000000 00000000
         F: 00000000 00000000 00000000 00000000
        10: 00000000 00000000 00000000 00000000
        11: 00000000 00000000 00000000 00000000
        12: 00000000 00000000 00000000 00000000
        13: 00000000 00000000 00000000 00000000
        14: 00000000 00000000 00000000 00000000
        15: 00000002 000000F0 00000000 00000000
        16: 00000BB8 00000FA0 00000064 00000000
  80000000: 80000008 00000000 00000000 00000000
  80000001: 00000000 00000000 00000121 2C100800
  80000002: 65746E49 2952286C 6F655820 2952286E
  80000003: 6C6F4720 32362064 20523834 20555043
  80000004: 2E332040 48473030 0000007A 00000000
  80000005: 00000000 00000000 00000000 00000000
  80000006: 00000000 00000000 01006040 00000000
  80000007: 00000000 00000000 00000000 00000100
  80000008: 0000302E 00000000 00000000 00000000
terminating...

 

My OS is Ubuntu server 20.04.

Any suggestion?

Can I bring up one indexer outside of my cluster to prevent log drop and after the cluster will be stable join it to cluster?

Labels (2)
0 Karma

emzet
Explorer

If you have indexer cluster bucket should have GUID on name like this: 

db_1641702441_1641656220_301_C7FC9055-53C4-4411-99E8-98FF5BA9E5E3

GUID of indxer you can find in instance.cfg file.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

As your environment didn’t work and seems to have continuous crash, I propose that you should do ticket to splunk support asap with urgency one or two. They can help you.

r. Ismo

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...