Dashboards & Visualizations

Can Splunk help me manage large quantity of Dispatched jobs and queue delays?

damonmanni
Path Finder

Goal

Is there a better/cleaner/best practice way to implement my current approach (which is using a homegrown script) to manage dispatch cleanup? My script runs on each Search head member (3 of them).

Bonus Goal
Currently I email a simple text report. Instead, I would like to create a Dashboard graphing the trend of script results.

If I stay with my current approach, then how can I extract the following data point variables out of my report, read them into splunk, then graph on dashboard?
The timestamp (TS)
The dispatched job file quantity (CURR_COUNT)
Please see sample Cleanup & Nothing to cleanup Reports below

Alternatively, If there is a better approach to via queries, reports, graphs, etc then no issue

Script Effectiveness to date
Unless I am nuking files that I should not or any other logic holes, this script has been effective in keeping customer savedsearches running consistently vs. b/f when lots of complaints.
I have also increased resource values for both splunk and RHEL OS to help thruput

Cron entry

\*/60 \* \* \* \* su - splunk -c /opt/splunk/scripts/cleanup_dispatched_jobs.sh &> /tmp/dispatch.log

Code

#!/bin/bash
##############################################################################
# Damon Manni
# Runs in cronjob every hour every day - until better fix is in place
# must run as splunk user, not root
    ###############################################################################
# VARs
GO_BACK="-30m"
CEILING="1000"                                                                                             
  # My arbitrary threshold to trigger on
    SPLUNK_HOME="/opt/splunk"
    SCRIPT_ROOT="${SPLUNK_HOME}/scripts"
    DISPATCH_DIR="${SPLUNK_HOME}/var/run/splunk/dispatch"
    HOLDING_DIR="${SPLUNK_HOME}/old-dispatch-jobs"
    CLEANUP_CMD="${SPLUNK_HOME}/bin/splunk cmd splunkd clean-dispatch"
    OUTPUT="${SCRIPT_ROOT}/dispatch.out"
    # Functions
    reset() {
    # setup for a clean run. temp/log/report files
      rm -rf ${SCRIPT_ROOT}/*.out
    }
    check_quota() {
    # simple check to see if the dispatched job file volume surpasses my arbitrary ceiling to cleanup or wait until next script run
      CURR_COUNT="`ls -1 ${DISPATCH_DIR} | wc -l`"                                      # I want to graph CURR_COUNT in a Dashboard. How?
      [ ${CURR_COUNT} -gt ${CEILING} ] && cleanup  || bow_out
    }
    bow_out() {
    #  All good
      echo "${TS}"
      echo "Current count = ${CURR_COUNT} - no need to cleanup.  Waiting until next job run"
      cat /tmp/dispatch.log | mail -s "Cleanup Dispatch-${HOSTNAME}: ${CURR_COUNT}" jojo@thedolphin.com
      exit 0
    }
    cleanup() {
    # Triggered High volume that can impact job/parsing queue's, etc.   
     temp_dir
      ${CLEANUP_CMD} ${HOLDING_DIR}/${TS} ${GO_BACK} 2>&1 > ${OUTPUT}
    }
    gen_ts() {
      TS="`date +"%m-%d-%Y_%H-%M-%S"`"
    }
    temp_dir() {
    # Create unique dir to recieve snaphot b/f cleanup/delete
      [ ! -d ${HOLDING_DIR}/${TS} ]  && { echo "Creating holding dir..."; mkdir -p ${HOLDING_DIR}/${TS}; }
    }
    # Data points to help debugging/status
    report() {
      echo "${TS}"                                                                                                          # I want to graph TS in a Dashboard. How?
      echo "Ceiling = ${CEILING}"
      echo "Current count = ${CURR_COUNT}"                                                         # I want to graph CURR_COUNT in a Dashboard. How?
      echo "Holding dir =  ${HOLDING_DIR}/${TS}"
      echo
      cat "${OUTPUT}"
      echo "Tarball = ${HOLDING_DIR}/${TS}.tar.z"
      cat "${OUTPUT}" /tmp/dispatch.log | mail -s "Cleanup Dispatch-${HOSTNAME}: ${CURR_COUNT}" jojo@thedolphin.com
    }
    squeeze() {
    # compress backup to run lean
    echo "Making tarball for backup..."
    tar zcvf ${HOLDING_DIR}/${TS}.tar.z  ${HOLDING_DIR}/${TS}
    [ $? -eq 0 ] && { echo "Done."; rm -rf ${HOLDING_DIR}/${TS}; } || { echo "Failed."; exit 1; }
    }
# Main
echo "HOSTNAME"
reset
gen_ts
check_quota
report
squeeze
exit 0

Cleanup Report

Search-Head-member-hostname
Creating holding dir
Using logging configuration at /opt/splunk/etc/log-cmdline.cfg.03-29-2018_12-00-05
Ceiling = 1000
Current count = 1153
Holding dir \=  /opt/splunk/old-dispatch-jobs/03-29-2018_12-00-05
dispatch dir:      /opt/splunk/var/run/splunk/dispatch
destination dir:   /opt/splunk/old-dispatch-jobs/03-29-2018_12-00-05
earliest mod time: 2018-03-29T11:30:05.000-04:00
total: 1153, moved: 823, failed: 0, remaining: 330 job directories from /opt/splunk/var/run/splunk/dispatch to /opt/splunk/old-dispatch-jobs/03-29-2018_12-00-05
Tarball = /opt/splunk/old-dispatch-jobs/03-29-2018_12-00-05.tar.z

Nothing to cleanup report

Search-Head-member-hostname
03-29-2018_15-00-02
Current count = 927 - no need to cleanup.  Waiting until next job run

All help much appreciated as always.
cheers,
Damon

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...