Splunk 7.3.0 : Possibility of SplunkSearchHead no...

Saravanakumar · ‎06-22-2020

Observation:

Suddenly the SplunkSearchHead stopped cleaning the jobs in dispatch directory (/opt/splunk/var/run/splunk/dispatch). Found jobs of age greater than one year.

Due to that

/opt partition filled up with huge jobs nearly 97021 in dispatch directory. The available /opt partition free space reduced lesser than 5 GB.
The dashboard CSV lookup failed. Not able to perform read operation on /opt partition.

Action taken:

We cleaned the jobs of age greater than one day manually. Then SplunkSearchHead continuous to run normally: (i) Able to perform CSV lookups (ii) started cleaning the jobs greater than one hour automatically – We observed this for last 10 days. Always the dispatch directory contains almost 201 jobs.

Question:

(1) We are not able to identify the root cause regarding the reason for sudden hike of 97021 jobs fill-ups in the dispatch directory. Can you please help us to understand the possibility of such occurrence?

(2) On splunk forum search, found few solution. We partially automated with the script (a) Cleaning the failed jobs logic seems fine. (b) Not sure about the right procedure to clean the remaining jobs on demand blindly. Can another mechanism is there to check the files in the job directory and identify them as not need jobs and clean them?

(3) Is any other solution is there to clean them by automated way ?

Partially automated script for cleaning

#!/usr/bin/env bash

SPLUNK_HOME=/opt/splunk
SPLUNK_DISPATCH_DIR=$SPLUNK_HOME/var/run/splunk/dispatch
CLEAN_UP_OLD_JOBS_IN_DAYS=1

# Clean the failed jobs
# If subdirectories in $SPLUNK_DISPATCH_DIR are beyond 24hrs of their last modtime and the job directory does not
# contain both info.csv and status.csv files, then that job is considered to be as failed search job.
find $SPLUNK_DISPATCH_DIR -mindepth 1 -maxdepth 1 -type d -mtime +$CLEAN_UP_OLD_JOBS_IN_DAYS |
while read job;
do
if [ ! -e "$job/info.csv" ] && [ ! -e "$job/status.csv" ] ; then
rm -rfv $job
fi;
done;

# Clean the splunk home parition on demand
# If jobs of last modified time greater than 24 hours when when the splunk-home disk space is less than
# 5 GB and splunk jobs are more than 500
BLOCK_SIZE_IN_BYTES=1024
FIVE_GB_IN_KILOBYTES=5242880
SPLUNK_JOBS_MAX_COUNT_LIMIT=500
SPLUNK_HOME_FS_MOUNT_AVAILABLE_SIZE=$(df -P --block-size=$BLOCK_SIZE_IN_BYTES /opt/splunk/ | tail -1 | awk '{print $4}')
SPLUNK_JOBS_COUNT=$(ls /opt/splunk/var/run/splunk/dispatch | wc -l)

if [ $SPLUNK_HOME_FS_MOUNT_AVAILABLE_SIZE -lt $FIVE_GB_IN_KILOBYTES ] && [ $SPLUNK_JOBS_COUNT -gt $SPLUNK_JOBS_MAX_COUNT_LIMIT ] ; then
find $SPLUNK_DISPATCH_DIR -mindepth 1 -maxdepth 1 -type d -mtime +$CLEAN_UP_OLD_JOBS_IN_DAYS |
while read job;
do
rm -rfv $job
done;
fi

Splunk 7.3.0 : Possibility of SplunkSearchHead not cleaning the jobs in dispatch directory.

other

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!