All Apps and Add-ons

splunk watchdog

Andrew80k
New Member

How do you enable the watchdog to start on boot?

TIA

Tags (1)
0 Karma
1 Solution

Lowell
Super Champion

Update: So all of this was written about Splunk 4.x. So keep that in mind.

For what it's worth. I've found the watchdog "feature" to be more trouble than it's worth, at least on Linux, I haven't tried it on other systems. I've run into situation where it has started up a second instance of splunk because it thought splunkd was down. (Which was technically true, I suppose) but it was down because splunk restart was run. So I ended up with two instances of splunkd concurrently, which ultimately ended up with some index corruption because of it. This happened more than once, or I would have written it off as a fluke.

Here is a script I've been using on a forwarder that, for whatever reason, splunkd seems to crash on quite frequently on that box (every couple months it will crash a couple of times in a row).

**WARNING:** Use this script at your own risks. It's only been tested on one system so far and it could possibly do "bad things". It's also pretty dumb, but it does seem to be slightly smarter than the splunkmon process, at least in my experience. By your mileage may vary. You've been warned!

/usr/local/sbin/check_splunkd.sh

#!/bin/bash
# TODO:  Add some kind of runlevel checking or something; make sure we aren't trying to startup whenever the system is poweroff down for example.      For now we are just going to risk it.

user=splunk
proc=splunkd
SPLUNK_HOME=/opt/splunk
MAINT_FILE=$SPLUNK_HOME/disabled

LOGGER="logger -t check_splunkd.sh -s"

if [ -f $MAINT_FILE ]
then
    echo "Splunk has been shutdown for maintenance mode (remove $MAINT_FILE) to re-enable automatic splunk restarting." | $LOGGER
    exit 0;
fi

# Wait 50 second to see if splunkd is really down (and not just restarting)
if ! pgrep -u $user $proc > /dev/null
then
    echo "Splunkd appears to be down." | $LOGGER
    i=0
    while ! pgrep -u $user $proc > /dev/null
    do
        let i+=1
        if [ $i -gt 10 ]; then break; fi
        echo "Loop $i"
        sleep 5
    done

    if [ $i -gt 10 ];
    then
        # Splunkd is not running.  Trying to start it up
        echo "Splunkd still not running.  Attempting to start!" | $LOGGER
        su $user -c "$SPLUNK_HOME/bin/splunk start splunkd"
        RETVAL=$?
        echo "Splunk started with RETVAL=$RETVAL" | $LOGGER
    else
        echo "Splunkd is now running.... Perhaps splunk was being restarted  (i=$i)" | $LOGGER
    fi
fi

I then schedule this script to run every 10 minutes (which is good enough for my needs)

/etc/crontab

*/10 *  * * *   root    /usr/local/sbin/check_splunkd.sh

If you want to temporarily disable this functionality, like during a splunk upgrade. Simply run:

touch /opt/splunk/disable

To re-enable it, simply remove the file:

rm /opt/splunk/disable

View solution in original post

Lowell
Super Champion

Update: So all of this was written about Splunk 4.x. So keep that in mind.

For what it's worth. I've found the watchdog "feature" to be more trouble than it's worth, at least on Linux, I haven't tried it on other systems. I've run into situation where it has started up a second instance of splunk because it thought splunkd was down. (Which was technically true, I suppose) but it was down because splunk restart was run. So I ended up with two instances of splunkd concurrently, which ultimately ended up with some index corruption because of it. This happened more than once, or I would have written it off as a fluke.

Here is a script I've been using on a forwarder that, for whatever reason, splunkd seems to crash on quite frequently on that box (every couple months it will crash a couple of times in a row).

**WARNING:** Use this script at your own risks. It's only been tested on one system so far and it could possibly do "bad things". It's also pretty dumb, but it does seem to be slightly smarter than the splunkmon process, at least in my experience. By your mileage may vary. You've been warned!

/usr/local/sbin/check_splunkd.sh

#!/bin/bash
# TODO:  Add some kind of runlevel checking or something; make sure we aren't trying to startup whenever the system is poweroff down for example.      For now we are just going to risk it.

user=splunk
proc=splunkd
SPLUNK_HOME=/opt/splunk
MAINT_FILE=$SPLUNK_HOME/disabled

LOGGER="logger -t check_splunkd.sh -s"

if [ -f $MAINT_FILE ]
then
    echo "Splunk has been shutdown for maintenance mode (remove $MAINT_FILE) to re-enable automatic splunk restarting." | $LOGGER
    exit 0;
fi

# Wait 50 second to see if splunkd is really down (and not just restarting)
if ! pgrep -u $user $proc > /dev/null
then
    echo "Splunkd appears to be down." | $LOGGER
    i=0
    while ! pgrep -u $user $proc > /dev/null
    do
        let i+=1
        if [ $i -gt 10 ]; then break; fi
        echo "Loop $i"
        sleep 5
    done

    if [ $i -gt 10 ];
    then
        # Splunkd is not running.  Trying to start it up
        echo "Splunkd still not running.  Attempting to start!" | $LOGGER
        su $user -c "$SPLUNK_HOME/bin/splunk start splunkd"
        RETVAL=$?
        echo "Splunk started with RETVAL=$RETVAL" | $LOGGER
    else
        echo "Splunkd is now running.... Perhaps splunk was being restarted  (i=$i)" | $LOGGER
    fi
fi

I then schedule this script to run every 10 minutes (which is good enough for my needs)

/etc/crontab

*/10 *  * * *   root    /usr/local/sbin/check_splunkd.sh

If you want to temporarily disable this functionality, like during a splunk upgrade. Simply run:

touch /opt/splunk/disable

To re-enable it, simply remove the file:

rm /opt/splunk/disable

Lowell
Super Champion

Which OS are running?

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas     Cisco Live 2026 is almost here, and this ...

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Hello Splunkers,   So you searched, “what is the name of the usb key inserted by bob smith?”  Not gonna lie… ...

Automating Threat Operations and Threat Hunting with Recorded Future

    Automating Threat Operations and Threat Hunting with Recorded Future June 29, 2026 | Register   Is your ...