Dashboards & Visualizations

Can Splunk help me manage large quantity of Dispatched jobs and queue delays?

damonmanni
Path Finder

Goal

Is there a better/cleaner/best practice way to implement my current approach (which is using a homegrown script) to manage dispatch cleanup? My script runs on each Search head member (3 of them).

Bonus Goal
Currently I email a simple text report. Instead, I would like to create a Dashboard graphing the trend of script results.

If I stay with my current approach, then how can I extract the following data point variables out of my report, read them into splunk, then graph on dashboard?
The timestamp (TS)
The dispatched job file quantity (CURR_COUNT)
Please see sample Cleanup & Nothing to cleanup Reports below

Alternatively, If there is a better approach to via queries, reports, graphs, etc then no issue

Script Effectiveness to date
Unless I am nuking files that I should not or any other logic holes, this script has been effective in keeping customer savedsearches running consistently vs. b/f when lots of complaints.
I have also increased resource values for both splunk and RHEL OS to help thruput

Cron entry

\*/60 \* \* \* \* su - splunk -c /opt/splunk/scripts/cleanup_dispatched_jobs.sh &> /tmp/dispatch.log

Code

#!/bin/bash
##############################################################################
# Damon Manni
# Runs in cronjob every hour every day - until better fix is in place
# must run as splunk user, not root
    ###############################################################################
# VARs
GO_BACK="-30m"
CEILING="1000"                                                                                             
  # My arbitrary threshold to trigger on
    SPLUNK_HOME="/opt/splunk"
    SCRIPT_ROOT="${SPLUNK_HOME}/scripts"
    DISPATCH_DIR="${SPLUNK_HOME}/var/run/splunk/dispatch"
    HOLDING_DIR="${SPLUNK_HOME}/old-dispatch-jobs"
    CLEANUP_CMD="${SPLUNK_HOME}/bin/splunk cmd splunkd clean-dispatch"
    OUTPUT="${SCRIPT_ROOT}/dispatch.out"
    # Functions
    reset() {
    # setup for a clean run. temp/log/report files
      rm -rf ${SCRIPT_ROOT}/*.out
    }
    check_quota() {
    # simple check to see if the dispatched job file volume surpasses my arbitrary ceiling to cleanup or wait until next script run
      CURR_COUNT="`ls -1 ${DISPATCH_DIR} | wc -l`"                                      # I want to graph CURR_COUNT in a Dashboard. How?
      [ ${CURR_COUNT} -gt ${CEILING} ] && cleanup  || bow_out
    }
    bow_out() {
    #  All good
      echo "${TS}"
      echo "Current count = ${CURR_COUNT} - no need to cleanup.  Waiting until next job run"
      cat /tmp/dispatch.log | mail -s "Cleanup Dispatch-${HOSTNAME}: ${CURR_COUNT}" jojo@thedolphin.com
      exit 0
    }
    cleanup() {
    # Triggered High volume that can impact job/parsing queue's, etc.   
     temp_dir
      ${CLEANUP_CMD} ${HOLDING_DIR}/${TS} ${GO_BACK} 2>&1 > ${OUTPUT}
    }
    gen_ts() {
      TS="`date +"%m-%d-%Y_%H-%M-%S"`"
    }
    temp_dir() {
    # Create unique dir to recieve snaphot b/f cleanup/delete
      [ ! -d ${HOLDING_DIR}/${TS} ]  && { echo "Creating holding dir..."; mkdir -p ${HOLDING_DIR}/${TS}; }
    }
    # Data points to help debugging/status
    report() {
      echo "${TS}"                                                                                                          # I want to graph TS in a Dashboard. How?
      echo "Ceiling = ${CEILING}"
      echo "Current count = ${CURR_COUNT}"                                                         # I want to graph CURR_COUNT in a Dashboard. How?
      echo "Holding dir =  ${HOLDING_DIR}/${TS}"
      echo
      cat "${OUTPUT}"
      echo "Tarball = ${HOLDING_DIR}/${TS}.tar.z"
      cat "${OUTPUT}" /tmp/dispatch.log | mail -s "Cleanup Dispatch-${HOSTNAME}: ${CURR_COUNT}" jojo@thedolphin.com
    }
    squeeze() {
    # compress backup to run lean
    echo "Making tarball for backup..."
    tar zcvf ${HOLDING_DIR}/${TS}.tar.z  ${HOLDING_DIR}/${TS}
    [ $? -eq 0 ] && { echo "Done."; rm -rf ${HOLDING_DIR}/${TS}; } || { echo "Failed."; exit 1; }
    }
# Main
echo "HOSTNAME"
reset
gen_ts
check_quota
report
squeeze
exit 0

Cleanup Report

Search-Head-member-hostname
Creating holding dir
Using logging configuration at /opt/splunk/etc/log-cmdline.cfg.03-29-2018_12-00-05
Ceiling = 1000
Current count = 1153
Holding dir \=  /opt/splunk/old-dispatch-jobs/03-29-2018_12-00-05
dispatch dir:      /opt/splunk/var/run/splunk/dispatch
destination dir:   /opt/splunk/old-dispatch-jobs/03-29-2018_12-00-05
earliest mod time: 2018-03-29T11:30:05.000-04:00
total: 1153, moved: 823, failed: 0, remaining: 330 job directories from /opt/splunk/var/run/splunk/dispatch to /opt/splunk/old-dispatch-jobs/03-29-2018_12-00-05
Tarball = /opt/splunk/old-dispatch-jobs/03-29-2018_12-00-05.tar.z

Nothing to cleanup report

Search-Head-member-hostname
03-29-2018_15-00-02
Current count = 927 - no need to cleanup.  Waiting until next job run

All help much appreciated as always.
cheers,
Damon

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...