Installation

Splunk forwarder container 9.0.5 not initializing and hanging on "splunk status" command

NevTheRev
Engager

Greetings !

We've been using the Version 8.1.12 of the forwarder container for some time (years) and need to move to version 9.  I've not been successful in getting the new version running and noted that the container is not  initializing and is unable to forward logs.

Most recently employing   docker.io/splunk/universalforwarder:latest
Digest: sha256:88fb1a2b8d4f47bea89b642973e6502940048010cd9ed288c713ac3c7d079a82
Our deployment is an unmodified image.

The container launches but on closer inspection (by opening a shell into the container) I can see it's hanging on the splunk status command (from ps -ef):
/opt/splunkforwarder/bin/splunk status --accept-license --answer-yes --no-prompt

If I run the same command (as above), I can see that it prompts on the following:

Perform migration and upgrade without previewing configuration changes? [y/n]

Answering "y" seems to move things along and it responds (with lots more lines):

"-- Migration information is being logged to '/opt/splunkforwarder/var/log/splunk/migration.log.2023-07-05.13-55-37' --"

After this, I can manually start the splunk forwarder !

Is there "something" I can do so that it passes through this step without prompting?

 

Here's some background if it helps:

We're using the same Azure Kubernetes service (AKS) 1.26.3 as before with Splunk forwarder 8.1

We're mapping in the following files:

/opt/splunk/etc/auth/sunsuper/splunkclient.chain
/opt/splunk/etc/auth/sunsuper/splunkclient.pem
/opt/splunkforwarder/etc/system/local/outputs.conf
/opt/splunkforwarder/etc/apps/ta-inspire/local/server.conf
/opt/splunkforwarder/etc/apps/ta-inspire/local/inputs.conf

and launching the container with the same following (yaml) environment:

          env:
            - name: TZ
              value: Australia/Brisbane
            - name: SPLUNK_START_ARGS
              value: '--accept-license --answer-yes --no-prompt'
            - name: SPLUNK_USER
              value: root
            - name: SPLUNK_FORWARD_SERVER
              value: fwdhost.probably.com.au:9997
            - name: SPLUNK_FORWARD_SERVER_ARGS
              value: >-
                -ssl-cert-path /opt/splunk/etc/auth/sunsuper/splunkclient.pem
                -ssl-root-ca-path
                /opt/splunk/etc/auth/sunsuper/splunkclient.chain -ssl-password
                secret -ssl-common-name-to-check
                fwdhost.probably.com.au -ssl-verify-server-cert false -auth
                admin:secret
            - name: ENVIRONMENT
              value: UNIT
            - name: SPLUNK_PASSWORD
              value: secret
            - name: SPLUNK_STANDALONE_URL
              value: fwdhost.probably.com.au:9997

Many thanks,
Nev

Labels (2)
0 Karma

NevTheRev
Engager

Thanks for your input  @reinier_post , I don't think it's container related but more related to the packaged files and what "splunk status" does on it's first run.  (I may have work around that can assist you)

Thanks also @isoutamo,  I have tried the "tty" fix and it made no difference to the deployment stopping at the "splunk status" step.

What I have done as a workaround (read "hack") was to determine what this "splunk status" is doing in the /opt/splunkforwarder/ directory.  I mentioned that I could manually run the command and it would prompt and subsequent runs of the command would execute without prompting.   What changes?  So I went about to capture the file time stamps before I manually ran the "splunk status" command:

find . -exec ls -ld {} \; > /tmp/list_before_status.txt

and after I'd manually run the command:

find . -exec ls -ld {} \; > /tmp/list_after_status.txt

I compared the lists in "WinMerge" ("diff" is a little harder to visualize) and here's what I established:

(1) It deletes ./ftr  (does that stand for "First Time Run" ? Is it a flag for "splunk status"?)
(2) It creates in the /opt/splunkforwarder directory:

var/log/splunk/migration.log.2023-07-07.16-47-57 (file)
var/run (a bunch of emtpy folders)
var/lib (a bunch of emtpy folders)
var/spool (a bunch of emtpy folders)
etc/apps/ta-inspire/metadata (it took a sniff of my app and created this directory)
etc/system/local/migration.conf (file)

My solution (yes, "hack") was to capture these files/directories and put them into my Docker build and to delete the "frt" directory.

I created a tar achive:

tar tf /tmp/Splunk905_things.tar \
var/log/splunk/migration.log.2023-07-07.16-47-57 \
var/run \
var/lib \
var/spool \
etc/apps/ta-inspire/metadata \
etc/system/local/migration.conf

I extracted the tar archive file from the running container and saved to my docker build directory.

In my Dockerfile I wanted to include steps

  1. Update the image (always good practice)
  2. Remove the "ftr" folder
  3. Create the files/directories that "splunk status" would do on first run

Here it is:

FROM splunk/universalforwarder:latest
LABEL authors="nev"
USER root:root
COPY ./Splunk905_things.tar /tmp/Splunk905_things.tar
RUN microdnf update -y \
&& cd /opt/splunkforwarder \
&& rm -rf ftr \
&& tar xvf /tmp/Splunk905_things.tar

I was able to build this image, deploy it and restart the container which deployed normally and began forwarding logs as expected.

That is,  it works !!

Disclaimer,  I have no idea what the first run of "splunk status" does when it's executed from the ansible playbook as part of the deployment of Splunkforwarder 9.0.5.  I may have captured my situation but if might fall short for yours.  Maybe it was only the "ftr" folder that was holding this up.  I followed a hunch that 1st time it ("splunk status") had to prompt and 2nd time it didn't so why no make the execution look like the 2nd time for the deployment?

In any instance, if you pass the options "--no-prompt --answer-yes" to a command, you don't want it stop and ask whether or not you want to migrate your brand new deployment. (bug?)

 

reinier_post
Explorer

Thank you Nev, this is very interesting, but I don't want to rely on a hack like this in production unless it's sanctioned by Splunk support.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Have you try to run 1st 

Splunk start --accept-license --answer-yes --no-prompt

before you run

splunk status ...

The 1st one wants that you authenticate, 2nd one don’t need that, if I recall right?

0 Karma

reinier_post
Explorer

Yes, I tried that, too: no difference. With start, it hangs as well.

0 Karma

reinier_post
Explorer

I'm running into the exact same issue: after upgrading a Universal Forwarder from 8.1.2 my Ansible role is issuing

  bin/splunk status --no-prompt --answer-yes --accept-license

to make sure Splunk won't hang later, and this command is hanging waiting for input.

We do not use a container, so I don't think the issue is container-related.

A workaround would be appreciated.

isoutamo
SplunkTrust
SplunkTrust

Hi

At least earlier there have been this kind of issue where Docker needs working tty. See

https://community.splunk.com/t5/Installation/Upgrading-Universal-Forwarder-8-x-x-to-9-x-x-does-not-w...

This may help or not as there is also another case https://community.splunk.com/t5/Installation/Why-is-universal-forwarder-not-starting-on-docker-image... which is still open?

r. Ismo

Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...