A segmentation fault (signal 11) in Splunk can have several potential causes, including memory corruption, insufficient resources, software bugs, or issues with the system configuration. Since you me...
See more...
A segmentation fault (signal 11) in Splunk can have several potential causes, including memory corruption, insufficient resources, software bugs, or issues with the system configuration. Since you mentioned that you haven't changed anything recently, it's crucial to systematically investigate and rule out potential causes. Steps to Troubleshoot and Diagnose the Issue: 1. Check Splunk Logs for More Clues Look at the splunkd.log and crash.log files for additional context around the crash. These logs can be found in: $SPLUNK_HOME/var/log/splunk/splunkd.log
$SPLUNK_HOME/var/log/splunk/crash*.log Run: grep -i 'fatal' $SPLUNK_HOME/var/log/splunk/splunkd.log
grep -i 'segfault' $SPLUNK_HOME/var/log/splunk/crash*.log This might provide more context on what was happening before the crash. 2. Validate System Memory and Kernel Overcommit Settings Your crash log shows "No memory mapped at address", which suggests possible memory issues. Check for kernel memory overcommitting, which can lead to random segmentation faults. Run: cat /proc/sys/vm/overcommit_memory If the value is 0, memory overcommit handling is heuristic-based. If 1, the system allows overcommitting memory, which is not recommended for Splunk. If 2, it's strict (recommended). If it's set to 1, you could consider changing it to see if that effects the Splunk service: echo 2 | sudo tee /proc/sys/vm/overcommit_memory And persist the change in /etc/sysctl.conf: vm.overcommit_memory = 2 Also see https://splunk.my.site.com/customer/s/article/Indexer-crashed-after-OS-upgrade 3. Check Transparent Huge Pages (THP) THP can cause issues with Splunk's memory management. Disable it if it’s enabled. Check the current status: cat /sys/kernel/mm/transparent_hugepage/enabled If it says [always], disable it temporarily: echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag To make it permanent, add the following to /etc/rc.local: echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag 4. Check ulimits for Splunk User If the indexer is running into resource exhaustion, check its ulimits: ulimit -a Try updating to the following if not already: nofile = 65536
nproc = 16384 Adjust in /etc/security/limits.conf: splunk soft nofile 65536
splunk hard nofile 65536
splunk soft nproc 16384
splunk hard nproc 16384 5. Check Splunk’s Memory and CPU Usage Run: ps aux --sort=-%mem | grep splunk
free -m
top -o %CPU Look for excessive memory or CPU consumption. 6. Check for Recent Software Updates or Kernel Changes If the system has undergone automatic updates, it might have introduced compatibility issues. Check recent updates: cat /var/log/dpkg.log | grep -i "upgrade" # Debian/Ubuntu
cat /var/log/yum.log | grep -i "update" # RHEL/CentOS Please let me know how you get on and consider accepting this answer or adding karma this answer if it has helped. Regards Will