[systemd-devel] Should services be able to run without /proc?

Antonius Frie antonius.frie at ruhr-uni-bochum.de
Tue Feb 9 14:57:29 UTC 2021


Hi!

So this is kind of a follow-up to the thread in [1], and the 
corresponding PR in [2].

In short, the PR made some changes to allow for cases where /proc was 
not available in the mount namespace of the service, and added a test 
[3] to make sure that this would work. This test was later removed and 
rewritten to block /sys instead [4], because it turned out that having 
/proc unavailable sometimes caused problems with close_all_fds(), which 
is called in exec_child() after namespaces have been set up.

On current master, services that don't have /proc mounted don't work at 
all anymore, since find_executable_full() ends up opening the given path 
and calling access_fd() on the resulting fd, and access_fd uses 
/proc/self/fd/* to turn the fd back into a path it can call access() on. 
As far as I can tell, the reason for not using access on the path 
directly is that access_fd is more elegant since it avoids a potential 
race condition.

In addition to this, setup_private_users() also needs access to 
/proc/$pid/{uid_map, gid_map, setgroups} to do its job.

Given all this, I guess my question is whether it is still desirable to 
allow units to run without /proc, especially given that ProtectProc and 
ProcSubset exist now.* If not, it might be nice to just always mount 
/proc if it wouldn't otherwise be there (i.e. if RootImage/RootDirectory 
is used); currently, MountAPIVFS=yes is basically a required option 
because of this. (I guess you could mount proc manually, but then you 
can't use ProtectProc/ProcSubset.) I'm a bit unhappy about this, because 
MountAPIVFS also mounts /sys and /dev, and then you need separate 
options just to protect those again. Either way, maybe it would be good 
to explicitly state this requirement in the documentation?

Anyway, I hope that this was okay to post here, I don't really know a 
lot about this and maybe there are good reasons for why things are the 
way they are. I'd be happy about feedback though.

Cheers,
Antonius

* Using both ProtectProc=ptraceable and ProcSubset=pid really doesn't
let a lot of things through, and I don't think those interfere with any 
of the functions described above. The only thing I'm unsure about is 
setup_private_users(), since that spawns off a child process which then
goes and writes to /proc/$parent_pid/, but I guess children can ptrace
their parents? At least it seemed to work when I just tested it.

[1]: 
https://lists.freedesktop.org/archives/systemd-devel/2017-April/038634.html
[2]: https://github.com/systemd/systemd/pull/5985
[3]: https://github.com/systemd/systemd/pull/6017
[4]: 
https://github.com/systemd/systemd/commit/054d871d41039fcfc1a4a661c979941b9660c9e6


More information about the systemd-devel mailing list