Getting Data In

Why is the monit process sometimes restarting

mataharry
Communicator

I have Linux servers with Splunk, and the process monit to check my processed.

But sometimes I see an issue where monit restarts Splunk unexpectedly.

Tags (3)
1 Solution

yannK
Splunk Employee
Splunk Employee

Sometimes Monit may failing to read the pid file of splunk, and decide too quickly that splunk is down

There are several common scenarii :
- splunk restarted for no reason (when the pid file was updated by a search process or child processes, monit gave up too quickly)
- splunk started twice during a restart. (splunk deletes the pod file when shutting down, and monit can restart it too quickly, ending up with 2 splunk process and a port conflict)

Please tune your monit logic to retry / wait more cycles before jumping the gun.

View solution in original post

awyszkowski
Splunk Employee
Splunk Employee

Here's some pointers for a "real world" Splunk process monitor in Monit, that will restart splunk when it is detected down by 'splunk status'.

First off, we want to get better downtime detection. We want to do away with pid checking and port checking, this often leads to confusion as pids can be somewhat fluid with Splunk. We also know that part of "normal" operation of Splunk can involve a restart (be it a rolling restart, a GUI invoked administrative restart after installing an app, etc). Best off to use Splunk's own "splunk status" command, and exploit the fact that exit status carries some value (0 means it's running, other status mean it is not or there was an issue determining state).

Secondly, monit tends to want to shut off the service prior to restarting it. This can lead to ugliness if splunk was actually running. So rather than using restart logic, just use a 'splunk start' to get it going again ('splunk start' is effectively a non-op if splunk is already running, as opposed to a stop-start).

Note - this does no alerting, and merely starts Splunk when it is detected down for two consecutive windows 5 minutes apart (you might have to tweak your settings if your global monit polling frequency is different).

-

Assuming you have the following setting in /etc/monitrc

# Polling frequency
set daemon 20

In /etc/monit/splunk_health.sh (new file)

#!/bin/bash
TEXT=`/opt/splunk/bin/splunk status 2>&1`
STATUS=$?
>&2 echo $TEXT
exit $STATUS

In /etc/monit/conf.d/splunk.monitrc (probably a new file)

check program splunkd with path "/etc/monit/splunk_health.sh" every 15 cycles
    start program = "/usr/sbin/service splunk start"
    stop program = "/usr/sbin/service splunk stop"
    if status !=0 for 2 cycles then start

yannK
Splunk Employee
Splunk Employee

Sometimes Monit may failing to read the pid file of splunk, and decide too quickly that splunk is down

There are several common scenarii :
- splunk restarted for no reason (when the pid file was updated by a search process or child processes, monit gave up too quickly)
- splunk started twice during a restart. (splunk deletes the pod file when shutting down, and monit can restart it too quickly, ending up with 2 splunk process and a port conflict)

Please tune your monit logic to retry / wait more cycles before jumping the gun.

Get Updates on the Splunk Community!

Improve Your Security Posture

Watch NowImprove Your Security PostureCustomers are at the center of everything we do at Splunk and security ...

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...