Splunk forwarder container 9.0.5 not initializing ...

NevTheRev · ‎07-05-2023

Greetings !

We've been using the Version 8.1.12 of the forwarder container for some time (years) and need to move to version 9. I've not been successful in getting the new version running and noted that the container is not initializing and is unable to forward logs.

Most recently employing docker.io/splunk/universalforwarder:latest
Digest: sha256:88fb1a2b8d4f47bea89b642973e6502940048010cd9ed288c713ac3c7d079a82
Our deployment is an unmodified image.

The container launches but on closer inspection (by opening a shell into the container) I can see it's hanging on the splunk status command (from ps -ef):
/opt/splunkforwarder/bin/splunk status --accept-license --answer-yes --no-prompt

If I run the same command (as above), I can see that it prompts on the following:

Perform migration and upgrade without previewing configuration changes? [y/n]

Answering "y" seems to move things along and it responds (with lots more lines):

"-- Migration information is being logged to '/opt/splunkforwarder/var/log/splunk/migration.log.2023-07-05.13-55-37' --"

After this, I can manually start the splunk forwarder !

Is there "something" I can do so that it passes through this step without prompting?

Here's some background if it helps:

We're using the same Azure Kubernetes service (AKS) 1.26.3 as before with Splunk forwarder 8.1

We're mapping in the following files:

/opt/splunk/etc/auth/sunsuper/splunkclient.chain
/opt/splunk/etc/auth/sunsuper/splunkclient.pem
/opt/splunkforwarder/etc/system/local/outputs.conf
/opt/splunkforwarder/etc/apps/ta-inspire/local/server.conf
/opt/splunkforwarder/etc/apps/ta-inspire/local/inputs.conf

and launching the container with the same following (yaml) environment:

env:
- name: TZ
value: Australia/Brisbane
- name: SPLUNK_START_ARGS
value: '--accept-license --answer-yes --no-prompt'
- name: SPLUNK_USER
value: root
- name: SPLUNK_FORWARD_SERVER
value: fwdhost.probably.com.au:9997
- name: SPLUNK_FORWARD_SERVER_ARGS
value: >-
-ssl-cert-path /opt/splunk/etc/auth/sunsuper/splunkclient.pem
-ssl-root-ca-path
/opt/splunk/etc/auth/sunsuper/splunkclient.chain -ssl-password
secret -ssl-common-name-to-check
fwdhost.probably.com.au -ssl-verify-server-cert false -auth
admin:secret
- name: ENVIRONMENT
value: UNIT
- name: SPLUNK_PASSWORD
value: secret
- name: SPLUNK_STANDALONE_URL
value: fwdhost.probably.com.au:9997

Many thanks,
Nev

NevTheRev · ‎07-08-2023

Thanks for your input @reinier_post , I don't think it's container related but more related to the packaged files and what "splunk status" does on it's first run. (I may have work around that can assist you)

Thanks also @isoutamo, I have tried the "tty" fix and it made no difference to the deployment stopping at the "splunk status" step.

What I have done as a workaround (read "hack") was to determine what this "splunk status" is doing in the /opt/splunkforwarder/ directory. I mentioned that I could manually run the command and it would prompt and subsequent runs of the command would execute without prompting. What changes? So I went about to capture the file time stamps before I manually ran the "splunk status" command:

find . -exec ls -ld {} \; > /tmp/list_before_status.txt

and after I'd manually run the command:

find . -exec ls -ld {} \; > /tmp/list_after_status.txt

I compared the lists in "WinMerge" ("diff" is a little harder to visualize) and here's what I established:

(1) It deletes ./ftr (does that stand for "First Time Run" ? Is it a flag for "splunk status"?)
(2) It creates in the /opt/splunkforwarder directory:

var/log/splunk/migration.log.2023-07-07.16-47-57 (file)
var/run (a bunch of emtpy folders)
var/lib (a bunch of emtpy folders)
var/spool (a bunch of emtpy folders)
etc/apps/ta-inspire/metadata (it took a sniff of my app and created this directory)
etc/system/local/migration.conf (file)

My solution (yes, "hack") was to capture these files/directories and put them into my Docker build and to delete the "frt" directory.

I created a tar achive:

tar tf /tmp/Splunk905_things.tar \
var/log/splunk/migration.log.2023-07-07.16-47-57 \
var/run \
var/lib \
var/spool \
etc/apps/ta-inspire/metadata \
etc/system/local/migration.conf

I extracted the tar archive file from the running container and saved to my docker build directory.

In my Dockerfile I wanted to include steps

Update the image (always good practice)
Remove the "ftr" folder
Create the files/directories that "splunk status" would do on first run

Here it is:

FROM splunk/universalforwarder:latest
LABEL authors="nev"
USER root:root
COPY ./Splunk905_things.tar /tmp/Splunk905_things.tar
RUN microdnf update -y \
&& cd /opt/splunkforwarder \
&& rm -rf ftr \
&& tar xvf /tmp/Splunk905_things.tar

I was able to build this image, deploy it and restart the container which deployed normally and began forwarding logs as expected.

That is, it works !!

Disclaimer, I have no idea what the first run of "splunk status" does when it's executed from the ansible playbook as part of the deployment of Splunkforwarder 9.0.5. I may have captured my situation but if might fall short for yours. Maybe it was only the "ftr" folder that was holding this up. I followed a hunch that 1st time it ("splunk status") had to prompt and 2nd time it didn't so why no make the execution look like the 2nd time for the deployment?

In any instance, if you pass the options "--no-prompt --answer-yes" to a command, you don't want it stop and ask whether or not you want to migrate your brand new deployment. (bug?)

reinier_post · ‎07-11-2023

Thank you Nev, this is very interesting, but I don't want to rely on a hack like this in production unless it's sanctioned by Splunk support.

isoutamo · ‎07-09-2023

Have you try to run 1st

Splunk start --accept-license --answer-yes --no-prompt

before you run

splunk status ...

The 1st one wants that you authenticate, 2nd one don’t need that, if I recall right?

reinier_post · ‎07-09-2023

Yes, I tried that, too: no difference. With start, it hangs as well.

reinier_post · ‎07-06-2023

I'm running into the exact same issue: after upgrading a Universal Forwarder from 8.1.2 my Ansible role is issuing

bin/splunk status --no-prompt --answer-yes --accept-license

to make sure Splunk won't hang later, and this command is hanging waiting for input.

We do not use a container, so I don't think the issue is container-related.

A workaround would be appreciated.

isoutamo · ‎07-06-2023

Hi

At least earlier there have been this kind of issue where Docker needs working tty. See

https://community.splunk.com/t5/Installation/Upgrading-Universal-Forwarder-8-x-x-to-9-x-x-does-not-w...

This may help or not as there is also another case https://community.splunk.com/t5/Installation/Why-is-universal-forwarder-not-starting-on-docker-image... which is still open?

r. Ismo

Splunk forwarder container 9.0.5 not initializing and hanging on "splunk status" command

Linux

universal forwarder

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases