[systemd-devel] Mounting /proc with -o hidepid breaks sd-login

Lennart Poettering lennart at poettering.net
Mon Oct 8 15:06:54 PDT 2012


On Tue, 09.10.12 00:28, Marti Raudsepp (marti at juffo.org) wrote:

> Hi list,
> 
> Recently I upgraded to Gnome 3.6 on my Arch Linux desktop, but
> gnome-session didn't work no matter what I tried. Ages of debugging
> later, strace revealed this:
> [pid  2063] open("/proc/1/cgroup", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No
> such file or directory)
> [...]
> [pid  2063] writev(2, [{"gnome-session[2063]: WARNING: Could not get
> session id for session. Check that logind is properly in"..., 150}],
> 1) = 150
> 
> Turns out it happens because I was mounting /proc with hidepid=2 on my
> systems. It's a nice security feature introduced in Linux 3.3 which
> hides all other users' processes from unprivileged users.
> 
> Jan Steffens pointed out that this open call actually comes from
> systemd's sd-login. What's the reason why sd-login needs to poke
> around in init's cgroups? It's being called by sd_pid_get_owner_uid
> and sd_pid_get_session, but I'm not entirely clear what's happening in
> that code.

So, basically, this is because cgroups are not virtualized for
containers. So in order to figure out in which service cgroup a unit is
in we need to find the cgroup of our entire system, so that we can chop
of the head of the full cgroup path. This is inecessary to make systemd
work nicely inside of containers.

Example:

A) On the host, sshd is in /system/sshd.service. 

B) A container is in /user/lennart/1/nspawn-4711/, i.e. PID 1 is in this
   group.

C) sshd inside that container is in
   /user/lennart/1/nspawn-4711/system/sshd.service.

To determine the service of a process we hence detect the cgroup PID of
PID 1, then chop that of the service cgroup, then remove the /system,
and there we are. This logic hence makes unit cgroups work fine
regardless how deep our hierarchies are nested.

> AFAICT on regular systems, init's cgroup is always "/system", in which
> case it gets ignored entirely by the code. Would be safe assume that
> on failure to open? Are there any other ways to solve this?

Well, such a constant fallback would fix your immediate problem but
would leave things broken for container setups. I'd really see this
fixed properly in some way. My first reaction would be to see hidepid=
fixed not to hide pid 1 in the kernel (it's kinda stupid anyway to hide
it, simply since it's the parent of all processes anyway, and is quite
frequently assumed to exist).

Also note that systemd actually relies on /proc/1 to be around for other
puroses too, for example, to detect chroots, to read system env vars and
other things.

Dunno. Other ideas?

Lennart

-- 
Lennart Poettering - Red Hat, Inc.


More information about the systemd-devel mailing list