Hello to everyone,
My environment :
A part of my infrastructure is deployed as Docker containers that are built and configured by myself. Bassicaly I'm pulling an ubuntu:latest image on which i'm installing a splunk forwarder which will transfer logs to a central splunk enterprise.
On start time I'm using supervisorD which is a process control system to start the UF and other processes.
The following steps are done on every build / deployment :
- Pulling the latest image of Ubuntu
- Installing / Configuring splunk forwarder (creating user / downloading . deb / installing)
- Installing other packages
- Starting Docker container with a simple bash script that do things at runtime.
- Starting the service via supervisorD as root which then start the UF as splunk user.
Configurations
My Docker File configuration.
FROM ubuntu:latest
ENV TZ=Europe/Paris
ARG DEBIAN_FRONTEND=noninteractive
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN adduser --home /home/www-python --disabled-password --gecos "" www-python \
&& groupadd -r splunk \
&& useradd -r -m -g splunk splunk \
&& apt update \
&& apt install -y python3 python3-pip wget curl supervisor
RUN wget -O splunkforwarder-8.2.8-da25d08d5d3e-linux-2.6-amd64.deb "https://download.splunk.com/products/universalforwarder/releases/8.2.8/linux/splunkforwarder-8.2.8-da25d08d5d3e-linux-2.6-amd64.deb" \
&& dpkg -i splunkforwarder-*.deb \
&& rm -f splunkforwarder-*
COPY [ "src/splunkforwarder/inputs.conf", "src/splunkforwarder/outputs.conf", "src/splunkforwarder/server.conf", "/opt/splunkforwarder/etc/system/local/" ]
USER root
WORKDIR /root/
COPY [ "src/supervisor/service.conf", "/root/"]
COPY ./src/start.sh /root/
RUN chmod +x /root/start.sh
My start.sh script.
#!/bin/bash
#Doing runtime stuff
supervisord -c /root/service.conf
My supervisor configuration.
[supervisord]
nodaemon=true
user=root
[program:splunkforwarder]
command=/opt/splunkforwarder/bin/splunk start --accept-license --answer-yes --no-prompt
user=splunk
[program:python-script]
command=some command to start a service
Problem encountered
I've traced back the problem to the version 9.0.0. So all the steps and configuration in this post is working on all versions under 9.0.0.
Version 9.0.1 is also having the same behavior.
When the container starts, my supervisord indicates that everything started smoothly.
2022-09-12 09:15:17,813 INFO Set uid to user 0 succeeded
2022-09-12 09:15:17,844 INFO supervisord started with pid 10
2022-09-12 09:15:18,856 INFO spawned: 'python-script' with pid 11
2022-09-12 09:15:18,858 INFO spawned: 'splunkforwarder' with pid 12
2022-09-12 09:15:19,863 INFO success: python-web entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2022-09-12 09:15:19,863 INFO success: splunkforwarder entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
But the catch starts here, the splunk daemon seems to be stuck. As you can see my ps aux indicates that the service is taking all the CPU.
root@demo:~# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 2904 1012 ? Ss 09:15 0:00 /bin/sh -c /root/start.sh
root 7 0.0 0.0 4508 3516 ? S 09:15 0:00 /bin/bash /root/start.sh
root 10 0.0 0.1 33220 27204 ? S 09:15 0:02 /usr/bin/python3 /usr/bin/supervisord -c /root/service.conf
splunk 12 56.0 0.0 4516 2900 ? R 09:15 145:55 /opt/splunkforwarder/bin/splunk start --accept-license --answer-yes --no-prompt
When you go look for more logs, nothings is created, like the service never did start.
root@demo:~# ls -alh /opt/splunkforwarder/var/log/splunk/
total 4.0K
drwx------ 2 splunk splunk 31 Sep 12 09:15 .
drwx--x--- 5 splunk splunk 57 Sep 12 09:15 ..
-rw------- 1 splunk splunk 70 Sep 12 09:15 first_install.log
So if I try to start manualy the service, it asks me to accept the licence agreements & indicates that a previous installion has been found and needs to migrate the instance.
The only logs of errors I have are these lines :
Creating unit file...
Error calling execve(): No such file or directory
Error launching command: No such file or directory
Like I was saying earlier, the only thing that changed on this case is the version of splunk. I can start the service by using version 8.2.X but not the latest one.
Does anyone have any inputs on this matter ? I didn't had any insight by looking at the threads on this site (or elsewhere)
Where you able to fix this?
So I ran into the same problem.
For me it was that I was running kernel 2.6.X and 9.0 stopped supporting that.
Check your kernel version and see if thats the issue (uname -msr )
supported version for 8.2: https://docs.splunk.com/Documentation/Splunk/8.2.5/Installation/Systemrequirements
vs 9.0.2
https://docs.splunk.com/Documentation/Splunk/latest/Installation/Systemrequirements?_ga=2.37568416.1....