[systemd-devel] [PATCH] timesyncd: tighten unit file

Lennart Poettering lennart at poettering.net
Mon Jan 26 17:54:11 PST 2015


On Sun, 25.01.15 12:23, Topi Miettinen (toiwoton at gmail.com) wrote:

> There's no need for CAP_CHOWN, CAP_DAC_OVERRIDE or CAP_FOWNER.

Hmm, that's not true, is it? load_clock_timestamp() is invoked before
we drop privs in the daemon. And it certainly calls fchmod() and
fchown(), so that it can later on still access the clock touch file.

What am I missing?

> No new privileges are needed, especially no setuid fixups are
> expected.

Yeah, we should probably set this for most of our daemons, not just
timesyncd. 

I am not entirely sure how NoNewPrivileges= and "no-setuid-fixup"
actually relate to each other. I.e. what does one do that the other
doesn't? And if they do different things, isn't NoNewPrivileges= kinda
a superset of no-setuid-fixup, and thus makes setting the latter
redundant?

Reading through Documentation/prctl/no_new_privs.txt in the
kernel sources I found this paragraph:

       "Be careful, though: LSMs might also not tighten constraints on
       exec in no_new_privs mode.  (This means that setting up a
       general-purpose service launcher to set no_new_privs before
       execing daemons may interfere with LSM-based sandboxing.)"

This kinda suggests that SELinux and friends might not like
NoNewPrivileges=, hence I am not entirely sure it is really the right
option for us.

Hence I am wondering if "no-setuid-fixup" (and -locked) might be the
better option for us to enable everywhere?

> Device policy can be closed, timesyncd does not access any devices.

Note that we set PrivateDevices=yes already, which implies
DevicePolicy=closed.

> Timesyncd only needs write access to /var/lib/systemd/clock. There's no
> need to access /boot nor most API filesystems.

Well, /boot is already covered by ProtectSystem=full actually (but
this wasn't documented, admittedly. Updated the man page now in gate.)

I am a bit concerned about listing by default all these directories in
the unit file. I mean, the reason why ProtectSystem=, ProtectHome= and
suchlike exists is precisely to break this down to a few booleans (or
boolean-like enums), instead of listing all directories in all unit
files all the time. I mean, it's great if people do this on their
local systems, but to make this palatable generically upstream I
created ProtectSystem= and ProtectHome= to hide the directory lists
away.

The problem with these lists is that we need to be really conservative
with them, since many of the libs we use (including glibc) use
different interface "behind our back", and thus writing generally
valid rules, that work on all archs, on all distros, on all
glibc/libary versions is hard. This is what SELinux policy does, but I
*really* didn't want to create another implementation of such a
finegrained policy, if you follow what I mean.

> Install a system call filter, generated with 'strace -c'.

Hmm, I am a bit concerned about this part, I am not sure we can really
do this upstream. Different glibc and other library versions we use
invoke different syscalls. Moreover, different archs use different
syscalls, too. This makes it really difficult putting together syscall
filters upsteram that work generically, on all downstream versions.

I think SystemCallFilter= is something that end-users can use on their
systems, but I am not sure we can use this upstream. :-(

> Only one additional process is needed.

Hmm, LimitNPROC=2 unfortunately doesn't do what we want it to do:
since the daemon drops the to the systemd-timesync user on its own,
PID 1 invokes it as root, not as the the systemd-timesync user, which
makes the excercise pointless... We should probably handle this better
though, and warn, instead of silently accepting it. Added that to the
TODO list now.

I have now changed the timesyncd code to do the right thing on its own
after dropping privs. During runtime RLIMIT_NPROC=2 should now be set.

> Mounts should not propagate back, so set the MountFlags to slave
> (actually default since we use e.g. PrivateTmp).

This is implied by PrivateTmp=, indeed, hence unnecessary.

Lennart

-- 
Lennart Poettering, Red Hat


More information about the systemd-devel mailing list