Dashboards & Visualizations

Can Splunk help me manage large quantity of Dispatched jobs and queue delays?

damonmanni
Path Finder

Goal

Is there a better/cleaner/best practice way to implement my current approach (which is using a homegrown script) to manage dispatch cleanup? My script runs on each Search head member (3 of them).

Bonus Goal
Currently I email a simple text report. Instead, I would like to create a Dashboard graphing the trend of script results.

If I stay with my current approach, then how can I extract the following data point variables out of my report, read them into splunk, then graph on dashboard?
The timestamp (TS)
The dispatched job file quantity (CURR_COUNT)
Please see sample Cleanup & Nothing to cleanup Reports below

Alternatively, If there is a better approach to via queries, reports, graphs, etc then no issue

Script Effectiveness to date
Unless I am nuking files that I should not or any other logic holes, this script has been effective in keeping customer savedsearches running consistently vs. b/f when lots of complaints.
I have also increased resource values for both splunk and RHEL OS to help thruput

Cron entry

\*/60 \* \* \* \* su - splunk -c /opt/splunk/scripts/cleanup_dispatched_jobs.sh &> /tmp/dispatch.log

Code

#!/bin/bash
##############################################################################
# Damon Manni
# Runs in cronjob every hour every day - until better fix is in place
# must run as splunk user, not root
    ###############################################################################
# VARs
GO_BACK="-30m"
CEILING="1000"                                                                                             
  # My arbitrary threshold to trigger on
    SPLUNK_HOME="/opt/splunk"
    SCRIPT_ROOT="${SPLUNK_HOME}/scripts"
    DISPATCH_DIR="${SPLUNK_HOME}/var/run/splunk/dispatch"
    HOLDING_DIR="${SPLUNK_HOME}/old-dispatch-jobs"
    CLEANUP_CMD="${SPLUNK_HOME}/bin/splunk cmd splunkd clean-dispatch"
    OUTPUT="${SCRIPT_ROOT}/dispatch.out"
    # Functions
    reset() {
    # setup for a clean run. temp/log/report files
      rm -rf ${SCRIPT_ROOT}/*.out
    }
    check_quota() {
    # simple check to see if the dispatched job file volume surpasses my arbitrary ceiling to cleanup or wait until next script run
      CURR_COUNT="`ls -1 ${DISPATCH_DIR} | wc -l`"                                      # I want to graph CURR_COUNT in a Dashboard. How?
      [ ${CURR_COUNT} -gt ${CEILING} ] && cleanup  || bow_out
    }
    bow_out() {
    #  All good
      echo "${TS}"
      echo "Current count = ${CURR_COUNT} - no need to cleanup.  Waiting until next job run"
      cat /tmp/dispatch.log | mail -s "Cleanup Dispatch-${HOSTNAME}: ${CURR_COUNT}" jojo@thedolphin.com
      exit 0
    }
    cleanup() {
    # Triggered High volume that can impact job/parsing queue's, etc.   
     temp_dir
      ${CLEANUP_CMD} ${HOLDING_DIR}/${TS} ${GO_BACK} 2>&1 > ${OUTPUT}
    }
    gen_ts() {
      TS="`date +"%m-%d-%Y_%H-%M-%S"`"
    }
    temp_dir() {
    # Create unique dir to recieve snaphot b/f cleanup/delete
      [ ! -d ${HOLDING_DIR}/${TS} ]  && { echo "Creating holding dir..."; mkdir -p ${HOLDING_DIR}/${TS}; }
    }
    # Data points to help debugging/status
    report() {
      echo "${TS}"                                                                                                          # I want to graph TS in a Dashboard. How?
      echo "Ceiling = ${CEILING}"
      echo "Current count = ${CURR_COUNT}"                                                         # I want to graph CURR_COUNT in a Dashboard. How?
      echo "Holding dir =  ${HOLDING_DIR}/${TS}"
      echo
      cat "${OUTPUT}"
      echo "Tarball = ${HOLDING_DIR}/${TS}.tar.z"
      cat "${OUTPUT}" /tmp/dispatch.log | mail -s "Cleanup Dispatch-${HOSTNAME}: ${CURR_COUNT}" jojo@thedolphin.com
    }
    squeeze() {
    # compress backup to run lean
    echo "Making tarball for backup..."
    tar zcvf ${HOLDING_DIR}/${TS}.tar.z  ${HOLDING_DIR}/${TS}
    [ $? -eq 0 ] && { echo "Done."; rm -rf ${HOLDING_DIR}/${TS}; } || { echo "Failed."; exit 1; }
    }
# Main
echo "HOSTNAME"
reset
gen_ts
check_quota
report
squeeze
exit 0

Cleanup Report

Search-Head-member-hostname
Creating holding dir
Using logging configuration at /opt/splunk/etc/log-cmdline.cfg.03-29-2018_12-00-05
Ceiling = 1000
Current count = 1153
Holding dir \=  /opt/splunk/old-dispatch-jobs/03-29-2018_12-00-05
dispatch dir:      /opt/splunk/var/run/splunk/dispatch
destination dir:   /opt/splunk/old-dispatch-jobs/03-29-2018_12-00-05
earliest mod time: 2018-03-29T11:30:05.000-04:00
total: 1153, moved: 823, failed: 0, remaining: 330 job directories from /opt/splunk/var/run/splunk/dispatch to /opt/splunk/old-dispatch-jobs/03-29-2018_12-00-05
Tarball = /opt/splunk/old-dispatch-jobs/03-29-2018_12-00-05.tar.z

Nothing to cleanup report

Search-Head-member-hostname
03-29-2018_15-00-02
Current count = 927 - no need to cleanup.  Waiting until next job run

All help much appreciated as always.
cheers,
Damon

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Financial services organizations face an impossible equation: maintain 99.9% uptime for mission-critical ...

Customer success is front and center at .conf25

Hi Splunkers, If you are not able to be at .conf25 in person, you can still learn about all the latest news ...

.conf25 Global Broadcast: Don’t Miss a Moment

Hello Splunkers, .conf25 is only a click away.  Not able to make it to .conf25 in person? No worries, you can ...