Splunk Enterprise Security

Search Head Crashing - BundleReplicatorThread

konka4
Splunk Employee
Splunk Employee

Anyone run into this issue before?

Getting this on one of my ES search heads. It's crashing like every 2 hours, has 32GB of RAM, but I only ever see it take 5.5GB (running htop and watching it). Its bundle size average is about 500MB, well below the 2GB max. It's part of an indexer cluster with 10 indexers and 2 CM's (Active/Standby)

Any help is appreciated.


From Journal

sudo journalctl -u Splunkd --since "2 hour ago"
Sep 19 16:05:15 ##### splunk[3203013]: terminate called after throwing an instance of '15ThreadException'
Sep 19 16:05:15 ##### splunk[3203013]: what(): BundleReplicatorThread: about to throw a ThreadException: pthread_create: Cannot allocate memory; 118 threads active. Trying to create BundleReplThreadPoolWorker-9
Sep 19 16:05:15 ##### systemd-coredump[3277494]: [🡕] Process 3203013 (splunkd) of user 1250 dumped core.
Sep 19 16:05:15 ##### systemd[1]: Splunkd.service: Main process exited, code=dumped, status=6/ABRT

From Splunk Crash Log

Received fatal signal 6 (Aborted) on PID 2156721.
Cause: Signal sent by PID 2156721 running under UID 1250.
Crashing thread: BundleReplicatorThread

Registers:
RIP: [0x00007F81D6E8B94C] ? (libc.so.6 + 0x6394C)
RDI: [0x000000000020E8B1]
RSI: [0x000000000020EB4B]
RBP: [0x000000000020EB4B]
RSP: [0x00007F81AA9FC5C0]
RAX: [0x0000000000000000]
RBX: [0x00007F81AA9FF640]
RCX: [0x00007F81D6E8B94C]
RDX: [0x0000000000000006]
R8: [0x00007F81AA9FC690]
R9: [0x0000000000000003]
R10: [0x0000000000000008]
R11: [0x0000000000000246]
R12: [0x0000000000000006]
R13: [0x00007F81BB6C5D80]
R14: [0x00007F81BB7E3D00]
R15: [0x00007F81A92818D8]

EFL: [0x0000000000000246]
TRAPNO: [0x0000000000000000]
ERR: [0x0000000000000000]
CSGSFS: [0x002B000000000033]
OLDMASK: [0x0000000000000000]

OS: Linux
Arch: x86-64

Backtrace (PIC build):
Linux / ###### / 5.14.0-427.31.1.el9_4.x86_64
#1 SMP PREEMPT_DYNAMIC Fri Aug 9 14:06:03 EDT 2024 / x86_64

C++ exception:
exception_addr=0x7f81a932da40
typeinfo=0x55cc3873e410
name=15ThreadException
what(): BundleReplicatorThread: about to throw a ThreadException:
pthread_create: Cannot allocate memory; 118 threads active.
Trying to create BundleReplThreadPoolWorker-6

/etc/redhat-release:
Red Hat Enterprise Linux release 9.4 (Plow)

glibc version: 2.34
glibc release: stable
Last errno: 12
Threads running: 118
Runtime: 10426.389149s

argv: [splunkd --under-systemd --systemd-delegate=yes -p 8089 _internal_launch_under_systemd]

Regex JIT enabled
RE2 regex engine enabled
using CLOCK_MONOTONIC

Thread: "BundleReplicatorThread", did_join=0, ready_to_run=Y, main_thread=N, token=140194890118720
MutexByte: MutexByte-waiting={none}

x86 CPUID registers:
0: 00000016 756E6547 6C65746E 49656E69
1: 00050657 08010800 FFFA3203 0F8BFBFF
2: 00FEFF01 000000F0 00000000 00000000
3: 00000000 00000000 00000000 00000000
4: 00000000 00000000 00000000 00000000
5: 00000000 00000000 00000000 00000000
6: 00000004 00000000 00000000 00000000
7: 00000000 00000000 00000000 00000000
8: 00000000 00000000 00000000 00000000
9: 00000000 00000000 00000000 00000000
A: 08300801 000000FF 0000000F 00008000
B: 00000000 00000000 0000008F 00000008
C: 00000000 00000000 00000000 00000000
😧 00000000 00000000 00000000 00000000
E: 00000000 00000000 00000000 00000000
F: 00000000 00000000 00000000 00000000
10: 00000000 00000000 00000000 00000000
11: 00000000 00000000 00000000 00000000
12: 00000000 00000000 00000000 00000000
13: 00000000 00000000 00000000 00000000
14: 00000000 00000000 00000000 00000000
15: 00000000 00000000 00000000 00000000
16: 00000000 00000000 00000000 00000000
80000000: 80000008 00000000 00000000 00000000
80000001: 00000000 00000000 00000121 2C100800
80000002: 65746E49 2952286C 6F655820 2952286E
80000003: 6C6F4720 33362064 204E3033 20555043
80000004: 2E322040 48473032 0000007A 00000000
80000005: 00000000 00000000 00000000 00000000
80000006: 00000000 00000000 01006040 00000000
80000007: 00000000 00000000 00000000 00000100
80000008: 0000302D 00000200 00000000 00000000

terminating...

 

Labels (3)
0 Karma
1 Solution

konka4
Splunk Employee
Splunk Employee

An update to this.

 

It was Data Segment Size that had been set to 10GB and the instance needed at least 18GB, set it to unlimited and its humming along now for at least a week without crashing.

This is what I now have in my ulimits, LimitDATA was set to 10GB, now its set to infinity.

 

[Service]
Type=simple
Restart=always
ExecStart=/opt/splunk/bin/splunk _internal_launch_under_systemd
KillMode=mixed
KillSignal=SIGINT
TimeoutStopSec=360
LimitCORE=0
LimitNOFILE=65536
LimitNPROC=20480
LimitFSIZE=infinity
LimitDATA=infinity
KillMode=mixed
KillSignal=SIGINT
TimeoutStopSec=10min
LimitRTPRIO=99
SuccessExitStatus=51 52
RestartPreventExitStatus=51
RestartForceExitStatus=52
User=splunk
Group=splunk
Delegate=true
CPUWeight=100
#MemoryMax=33354076160
PermissionsStartOnly=true
ExecStartPost=-/bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/system.slice/%n"

View solution in original post

0 Karma

konka4
Splunk Employee
Splunk Employee

An update to this.

 

It was Data Segment Size that had been set to 10GB and the instance needed at least 18GB, set it to unlimited and its humming along now for at least a week without crashing.

This is what I now have in my ulimits, LimitDATA was set to 10GB, now its set to infinity.

 

[Service]
Type=simple
Restart=always
ExecStart=/opt/splunk/bin/splunk _internal_launch_under_systemd
KillMode=mixed
KillSignal=SIGINT
TimeoutStopSec=360
LimitCORE=0
LimitNOFILE=65536
LimitNPROC=20480
LimitFSIZE=infinity
LimitDATA=infinity
KillMode=mixed
KillSignal=SIGINT
TimeoutStopSec=10min
LimitRTPRIO=99
SuccessExitStatus=51 52
RestartPreventExitStatus=51
RestartForceExitStatus=52
User=splunk
Group=splunk
Delegate=true
CPUWeight=100
#MemoryMax=33354076160
PermissionsStartOnly=true
ExecStartPost=-/bin/bash -c "chown -R splunk:splunk /sys/fs/cgroup/system.slice/%n"
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Crashes of Splunk should be reported to Splunk Support.  They can help you better than we can.

---
If this reply helps you, Karma would be appreciated.
0 Karma
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.


Introducing Unified TDIR with the New Enterprise Security 8.2

Read the blog
Get Updates on the Splunk Community!

CX Day is Coming!

Customer Experience (CX) Day is on October 7th!! We're so excited to bring back another day full of wonderful ...

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...