Hello, I am having issues with my splunk universal fowarders.
Problem: The Splunk Universal Forwarders are not upgrading from version 7.2.6 to Version 8 using the custom app I developed. However, The custom app is a replica of 7.2.6. I created a another app that has the exact same features as version 7.2.6. However, once it shuts down, it does not restart or upgrade the server. Here is the custom app.
#!/bin/bash
# set splunk path
SPLUNK_HOME=/opt/splunkforwarder
# set desired version
NVER=8.2.2
# determine current version
CVER=`cat $SPLUNK_HOME/etc/splunk.version | grep VERSION | cut -d= -f2`
if [ "$NVER" != "$CVER" ]
then
echo "Upgrading Splunk to $NVER."
$SPLUNK_HOME/bin/splunk stop
tar -xvf $SPLUNK_HOME/etc/apps/splunk_upgrade_lin_v8/static/splunkforwarder-8.2.2-87344edfcdb4-Linux-x86_64.tgz -C /opt
$SPLUNK_HOME/bin/splunk start --accept-license --answer-yes
fi
In the static folder, it has the splunkforwarder-8.2.2-87344edfcdb4-Linux-x86_64.tgz.
In the bin directory, the script above is the upgrade.sh and the wrapper.sh I created points to this upgrade.sh
In the local directory, this is what I have listed.
[script://./bin/wrapper.sh]
disabled = false
interval = 3600
sourcetype = upgrade_linuxv8
Once again. This custom apps work completely fine with 7.2.6. Any version after that, splunk just stops once the app is assigned to the client, then the splunkforwarder shuts down and doesn't come back until I force remove the app (rm -rf) and restart splunk. Does Anyone has a work around with this?
Firstly, an obligatory warning: Splunk does not support self-invoked updates. You can brick your forwarders using remote update scripts like this. Consider yourself warned!
That being said, I can recreate your issue.
Ubuntu 22.04.
Splunk UF version 7.2.6 (x86_64) tgz.
I copied your scripts like-for-like into an app. What I'm observing is that the UF enters a loop of being up and down because the script runs instantly and then very quickly stops the UF. The reason this process is looping is because the upgrade is not working.
The issue preventing the upgrade is the tar command failing to unpack the new Splunk tgz. In my recreation, the tar command is returning a 141 exit code, which corresponds with (141 - 128=13) SIGPIPE signal. The tar command is writing to a pipe with no reader.
In this case, tar is writing to STDOUT because of the -v (verbose) argument in your script. When invoking upgrade.sh with a double-fork, upgrade.sh inherits the file descriptors of the wrapper.sh. These are pipes specifically set-up by Splunk to redirect the script's outputs to the internal mechanisms that deal with event forwarding. After we stop Splunk, upgrade.sh's STDOUT pipe has no reader, and when we try to write with it with tar, a SIGPIPE signal is raised which causes tar to exit.
The simplest way to get around this is to either remove the -v argument from tar or (the safer option) redirect the STDOUT and STDERR when invoking upgrade.sh as follows:
( /opt/splunkforwarder/etc/apps/splunk_upgrade_lin_v8/bin/upgrade.sh & ) > /dev/null 2>&1
Note that this will prevent your echo from reaching Splunk in upgrade.sh so you might want to move it to the wrapper.sh.
A word to the wise: Going full-circle, this is exactly what I was referring to at the start of this answer: remote upgrades like this can brick the UF in unpredictable ways. This app would've constantly downed the UF before it had a chance to phone home to your deployment server(s), preventing a roll-back. If you really do wish to push forward with this method, then consider changing your script interval to a cron. This will prevent the rapid looping and allow the UF to fire-up and connect to your DS if you need to perform a roll-back e.g.:
interval = 10 * * * *
Also, please consider adding more logging (to a log file) and error checking to your upgrade.sh too.
Let me know how you get on, this has been an interesting problem to debug!
I noticed in the upgrade scripts, you run
$SPLUNK_HOME/bin/splunk stop
$SPLUNK_HOME/bin/splunk start
What if systemd is being used? Every time I try splunk stop or start, I get this:
Shutting down. Please wait, as this may take a few minutes.
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to manage system services or units.
Authenticating as: --whatever user--
Password:
How do I get past this to automate upgrades that use systemd?
Firstly, an obligatory warning: Splunk does not support self-invoked updates. You can brick your forwarders using remote update scripts like this. Consider yourself warned!
That being said, I can recreate your issue.
Ubuntu 22.04.
Splunk UF version 7.2.6 (x86_64) tgz.
I copied your scripts like-for-like into an app. What I'm observing is that the UF enters a loop of being up and down because the script runs instantly and then very quickly stops the UF. The reason this process is looping is because the upgrade is not working.
The issue preventing the upgrade is the tar command failing to unpack the new Splunk tgz. In my recreation, the tar command is returning a 141 exit code, which corresponds with (141 - 128=13) SIGPIPE signal. The tar command is writing to a pipe with no reader.
In this case, tar is writing to STDOUT because of the -v (verbose) argument in your script. When invoking upgrade.sh with a double-fork, upgrade.sh inherits the file descriptors of the wrapper.sh. These are pipes specifically set-up by Splunk to redirect the script's outputs to the internal mechanisms that deal with event forwarding. After we stop Splunk, upgrade.sh's STDOUT pipe has no reader, and when we try to write with it with tar, a SIGPIPE signal is raised which causes tar to exit.
The simplest way to get around this is to either remove the -v argument from tar or (the safer option) redirect the STDOUT and STDERR when invoking upgrade.sh as follows:
( /opt/splunkforwarder/etc/apps/splunk_upgrade_lin_v8/bin/upgrade.sh & ) > /dev/null 2>&1
Note that this will prevent your echo from reaching Splunk in upgrade.sh so you might want to move it to the wrapper.sh.
A word to the wise: Going full-circle, this is exactly what I was referring to at the start of this answer: remote upgrades like this can brick the UF in unpredictable ways. This app would've constantly downed the UF before it had a chance to phone home to your deployment server(s), preventing a roll-back. If you really do wish to push forward with this method, then consider changing your script interval to a cron. This will prevent the rapid looping and allow the UF to fire-up and connect to your DS if you need to perform a roll-back e.g.:
interval = 10 * * * *
Also, please consider adding more logging (to a log file) and error checking to your upgrade.sh too.
Let me know how you get on, this has been an interesting problem to debug!
I made some progress so far. I removed the v out of the tar -xf. That did work when it came to installing it. I just had to manually start the server. Now I need to figure out how to get the --accept license to be accepted again. So good to see progress was made, however when I made the change inside of the wrapper script and added the script below, as well as the echo starting splunk, It stopped and didn't run the update.
( /opt/splunkforwarder/etc/apps/splunk_upgrade_lin_v8/bin/upgrade.sh & ) > /dev/null 2>&1
Adding the echo
Do I need to add the entire if then statement into the wrapper.sh?
#!/bin/bash
# set splunk path
SPLUNK_HOME=/opt/splunkforwarder
# set desired version
NVER=8.2.2
# determine current version
CVER=`cat $SPLUNK_HOME/etc/splunk.version | grep VERSION | cut -d= -f2`
if [ "$NVER" != "$CVER" ]
then
echo "Upgrading Splunk to $NVER."
$SPLUNK_HOME/bin/splunk stop
tar -xf $SPLUNK_HOME/etc/apps/splunk_upgrade_lin_v8/static/splunkforwarder-8.2.2-87344edfcdb4-Linux-x86_64.tgz -C /opt
$SPLUNK_HOME/bin/splunk start --accept-license --answer-yes
fi
Interesting let me try and recreate the issue as this was working fine in an Ubuntu VM.
What OS are you testing on?
Are you using SELinux?
Are you using systemd or initd?
I am using systemd and Red Hat OS
Can you share the output of the following commands so that I can recreate your environment and help you to debug:
grep ^VERSION= /etc/os-release
getenforce
/opt/splunkforwarder/bin/splunk display boot-start
Weird enough, I am having issues applying those commands. sorry for delay.
Can I clarify, On the wrapper script, does the entire if then statement need to be added into the wrapper.sh?
The only two things that I modified to get the installer to work were:
a) remove the -v switch from tar
b) pipe the forked-shell's output to /dev/null as per my answer
Either of these should work. Nothing else needed to be changed to get this to work on Splunk UF 7.2.6 (Ubuntu 22.04), including the location of the if then statement. Entire wrapper.sh:
#!/bin/bash
( /opt/splunkforwarder/etc/apps/splunk_upgrade_lin_v8/bin/upgrade.sh & ) > /dev/null 2>&1
I've asked for some more details, can you please share them when you get a chance?
Another update after testing, I realize that the issue may be with the tar. Removing -v from it worked, but did not start splunk. I had to go into the bin directory and manually accept the license. The server automatically shuts down afterwards. I am assuming that it's still in a loop.
I've done some more testing, this time on CentOS and I'm experiencing similar symptoms to you again.
$SPLUNK_HOME/bin/splunk start --accept-license --answer-yes is exiting with a 141 (just like tar was on Ubuntu) so it's failing to start. There must be a niche difference between the way Splunk is starting on the two OSs that explains why it started without a problem on Ubuntu.
That being said, the change to wrapper.sh from my original answer (redirecting the STDOUT and STDERR away from Splunk) solved this issue again. Based on your original Scripts, here is what is working for me:
wrapper.sh
#!/bin/bash
( /opt/splunkforwarder/etc/apps/splunk_upgrade_lin_v8/bin/upgrade.sh & ) > /dev/null 2>&1
upgrade.sh
#!/bin/bash
# set splunk path
SPLUNK_HOME=/opt/splunkforwarder
# set desired version
NVER=8.2.2
# determine current version
CVER=`cat $SPLUNK_HOME/etc/splunk.version | grep VERSION | cut -d= -f2`
if [ "$NVER" != "$CVER" ]
then
echo "Upgrading Splunk to $NVER."
$SPLUNK_HOME/bin/splunk stop
tar -xvf $SPLUNK_HOME/etc/apps/splunk_upgrade_lin_v8/static/splunkforwarder-8.2.2-87344edfcdb4-Linux-x86_64.tgz -C /opt
$SPLUNK_HOME/bin/splunk start --accept-license --answer-yes
fi
If you still can't get it to work, let's add some debugging to a /tmp/ file to check what's going on. Here is a modified upgrade.sh that will output some more verbose logs.
upgrade.sh (with logging)
#!/bin/bash
LOG=/tmp/splunk_upgrade.log
echo "Starting Splunk Upgrade." >> $LOG
# set splunk path
SPLUNK_HOME=/opt/splunkforwarder
# set desired version
NVER=8.2.2
# determine current version
CVER=`cat $SPLUNK_HOME/etc/splunk.version | grep VERSION | cut -d= -f2`
echo "Current Version: $CVER, Target Version: $NVER" >> $LOG
if [ "$NVER" != "$CVER" ]
then
echo "Proceeding with upgrade." >> $LOG
echo "Upgrading Splunk to $NVER."
echo "Stopping Splunk." >> $LOG
$SPLUNK_HOME/bin/splunk stop 2>>$LOG
echo "Stopping Splunk returned exit code: $?." >> $LOG
echo "Unpacking Splunk." >> $LOG
tar -xvf $SPLUNK_HOME/etc/apps/splunk_upgrade_lin_v8/static/splunkforwarder-8.2.2-87344edfcdb4-Linux-x86_64.tgz -C /opt 2>>$LOG
echo "Unpacking Splunk returned exit code: $?." >> $LOG
echo "Starting Splunk." >> $LOG
$SPLUNK_HOME/bin/splunk start --accept-license --answer-yes 2>>$LOG
echo "Stopping Splunk returned exit code: $?." >> $LOG
fi
echo "Complete." >> $LOG
Please share a copy of these logs and we'll see if there's anything different happening on your system.
Update on this task, I looked in the splunkd_stderr.log. This is reproduced each time I am activating the custom upgrade app.
2023-03-27 14:45:09.335 -0500 Interrupt signal received sent by PID 5126, command="/opt/splunkforwarder/bin/splunk stop" (UID 59867, same as my group)
2023-03-24 19:42:11.991 -0500 Interrupt signal received sent by PID 18532, command="/opt/splunkforwarder/bin/splunk stop" (UID 59867, same as my group)
2023-03-24 19:46:48.324 -0500 splunkd started (build 08187535c166) pid=19122
2023-03-24 19:46:54.620 -0500 Interrupt signal received sent by PID 19218, command="/opt/splunkforwarder/bin/splunk stop" (UID 59867, same as my group)
I removed the > /dev/null 2>&1 from the wrapper.sh and manually upgraded it. It worked fine as well as the splunkd is continuing to run. However the automated process is what's not working. Which is interesting. Leaving the > /dev/null 2>&1 in the wrapper.sh causes it not to manually upgrade also. So it seems like
As I am using the script that you updated, it is not bringing in any logs inside of the folder. That directory isn't being created also. Which is very odd. The wrapper is just as you stated. It installs splunk and accepts the license. However, It will not stay up now. Which is the odd part now.
Splunk Continues to shut down each time I restart it.
I am unable to provide the information. My Vm freezes in the process.
Can you share your wrapper.sh please?
#!/bin/bash
( /opt/splunkforwarder/etc/apps/splunk_upgrade_lin_v8/bin/upgrade.sh & )