Deployment Architecture

[9.0.3+] Invalid file path /proc/1/cgroup while checking container status

sylim_splunk
Splunk Employee
Splunk Employee

After upgrade from 8.2.4 to 9.0.4.1 forwarders connect to indexers then after the Indexer cluster gets stabilized. All looked good - new data are delivered and indexed  and searching works fine.

However, we start seeing the log messages below, WARN level messages, being populated into splunkd.log.:


05-31-2023 06:47:21.407 -0500 WARN SystemInfo [15415 TcpChannelThread] - Invalid file path /proc/1/cgroup while checking container status
 
During the upgrade, no new apps were added and no container is used for splunk. This kind of messages are found 3-4 times/min  by different components and also in pretty much all splunk entities, including SH, deployer, LM, indexers and CM.

 

We would like an analysis for that one.

Labels (3)
Tags (1)
0 Karma
1 Solution

sylim_splunk
Splunk Employee
Splunk Employee

 

i) What's the WARN message?

This is a new one added to support containerized host to the versions above 9.0.3. If it's confirmed to be hosted by a container it will use different library calls to detect system configurations/capacities..

Basically we query /sys/fs/cgroup/memory/ and /sys/fs/cgroup/cpu,cpuacct and we read /proc/1/cgroup to see if we are running inside of a container. Firstly, validates that the splunkd is running inside the container and once it validates then its reads the cpu/ram from the cgroup rather than reading from the traditional syscalls.
-

ii) Why it is happening after upgrade:
- With the version 8.2.4 it's not there yet. It  just starts to happen after the 9.0.4.1 upgrade as it's just added to the versions above 9.0.3.  

iii) What the log message tells us:

- The user, splunk is not allowed to access "/proc/1/cgroup", which has even set to proper permissions 

#ls -ld /proc/1
dr-xr-xr-x 9 root root 0 Apr 12 00:45 /proc/1

# ls -l /proc/1/cgroup
-r--r--r-- 1 root root 0 Jun 15 17:37 /proc/1/cgroup

 


- Other group users should be able to access and execute commands in "/proc/1" with "r-w" but the splunk account fails to access it and puts the WARN message:
 

iv) The reason for the access failure:

The proc file system was created with gid and hidepid=2 as below - only accessible by the account in the group 14001, that splunk account doesn't belong to;

----- mount options for proc --
$ findmnt /proc
proc on /proc type proc (rw,relatime,gid=14001,hidepid=2)
------

 

- hidepid=0 :By default, the hidepid option has the value zero (0). This means that every user can see all data.
- hidepid=1: When setting it to 1, the directories entries in /proc will remain visible, but not accessible.
- hidepid=2: With value 2 they are hidden altogether.

Reference: https://linux-audit.com/linux-system-hardening-adding-hidepid-to-proc/ 

v) Resolutions:

i) add splunk uid to the gid 14001, 

$usermod -aG 14001 splunk

Or
 
ii) remove the gid and hidepid.

$ mount -o remount,gid=0,hidepid=0 /proc

 

      FYI, the redhat doc recommends hidepid should not be used with systemd in rhel 7+ due to some reason mentioned in the link, https://access.redhat.com/solutions/6704531 .

View solution in original post

0 Karma

sylim_splunk
Splunk Employee
Splunk Employee

 

i) What's the WARN message?

This is a new one added to support containerized host to the versions above 9.0.3. If it's confirmed to be hosted by a container it will use different library calls to detect system configurations/capacities..

Basically we query /sys/fs/cgroup/memory/ and /sys/fs/cgroup/cpu,cpuacct and we read /proc/1/cgroup to see if we are running inside of a container. Firstly, validates that the splunkd is running inside the container and once it validates then its reads the cpu/ram from the cgroup rather than reading from the traditional syscalls.
-

ii) Why it is happening after upgrade:
- With the version 8.2.4 it's not there yet. It  just starts to happen after the 9.0.4.1 upgrade as it's just added to the versions above 9.0.3.  

iii) What the log message tells us:

- The user, splunk is not allowed to access "/proc/1/cgroup", which has even set to proper permissions 

#ls -ld /proc/1
dr-xr-xr-x 9 root root 0 Apr 12 00:45 /proc/1

# ls -l /proc/1/cgroup
-r--r--r-- 1 root root 0 Jun 15 17:37 /proc/1/cgroup

 


- Other group users should be able to access and execute commands in "/proc/1" with "r-w" but the splunk account fails to access it and puts the WARN message:
 

iv) The reason for the access failure:

The proc file system was created with gid and hidepid=2 as below - only accessible by the account in the group 14001, that splunk account doesn't belong to;

----- mount options for proc --
$ findmnt /proc
proc on /proc type proc (rw,relatime,gid=14001,hidepid=2)
------

 

- hidepid=0 :By default, the hidepid option has the value zero (0). This means that every user can see all data.
- hidepid=1: When setting it to 1, the directories entries in /proc will remain visible, but not accessible.
- hidepid=2: With value 2 they are hidden altogether.

Reference: https://linux-audit.com/linux-system-hardening-adding-hidepid-to-proc/ 

v) Resolutions:

i) add splunk uid to the gid 14001, 

$usermod -aG 14001 splunk

Or
 
ii) remove the gid and hidepid.

$ mount -o remount,gid=0,hidepid=0 /proc

 

      FYI, the redhat doc recommends hidepid should not be used with systemd in rhel 7+ due to some reason mentioned in the link, https://access.redhat.com/solutions/6704531 .

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...