[systemd-devel] [PATCH] timesyncd: tighten unit file

Tue Jan 27 11:47:49 PST 2015

On 01/27/15 01:54, Lennart Poettering wrote:
> On Sun, 25.01.15 12:23, Topi Miettinen (toiwoton at gmail.com) wrote:
> 
>> There's no need for CAP_CHOWN, CAP_DAC_OVERRIDE or CAP_FOWNER.
> 
> Hmm, that's not true, is it? load_clock_timestamp() is invoked before
> we drop privs in the daemon. And it certainly calls fchmod() and
> fchown(), so that it can later on still access the clock touch file.
> 
> What am I missing?

It works for me because the file exists already with correct owner and
permissions. I'd put CAP_CHOWN and CAP_FOWNER back for now.

Maybe the clock file could reside in a separate subdirectory, then the
directory owner and permissions could be set up already at installation?

> 
>> No new privileges are needed, especially no setuid fixups are
>> expected.
> 
> Yeah, we should probably set this for most of our daemons, not just
> timesyncd. 
> 
> I am not entirely sure how NoNewPrivileges= and "no-setuid-fixup"
> actually relate to each other. I.e. what does one do that the other
> doesn't? And if they do different things, isn't NoNewPrivileges= kinda
> a superset of no-setuid-fixup, and thus makes setting the latter
> redundant?

I think NoNewPrivileges (prctl(PR_SET_NO_NEW_PRIVS)) prevents any kind
of uid/gid changes, while no-setuid-fixup does not restrict uid change
but does not elevate the capability set.

> 
> Reading through Documentation/prctl/no_new_privs.txt in the
> kernel sources I found this paragraph:
> 
>        "Be careful, though: LSMs might also not tighten constraints on
>        exec in no_new_privs mode.  (This means that setting up a
>        general-purpose service launcher to set no_new_privs before
>        execing daemons may interfere with LSM-based sandboxing.)"
> 
> This kinda suggests that SELinux and friends might not like
> NoNewPrivileges=, hence I am not entirely sure it is really the right
> option for us.
> 
> Hence I am wondering if "no-setuid-fixup" (and -locked) might be the
> better option for us to enable everywhere?

I don't think timesyncd is executing any helper programs, especially
set-uid or with file system capabilities, so both should work fine.

> 
>> Device policy can be closed, timesyncd does not access any devices.
> 
> Note that we set PrivateDevices=yes already, which implies
> DevicePolicy=closed.
> 
>> Timesyncd only needs write access to /var/lib/systemd/clock. There's no
>> need to access /boot nor most API filesystems.
> 
> Well, /boot is already covered by ProtectSystem=full actually (but
> this wasn't documented, admittedly. Updated the man page now in gate.)
> 
> I am a bit concerned about listing by default all these directories in
> the unit file. I mean, the reason why ProtectSystem=, ProtectHome= and
> suchlike exists is precisely to break this down to a few booleans (or
> boolean-like enums), instead of listing all directories in all unit
> files all the time. I mean, it's great if people do this on their
> local systems, but to make this palatable generically upstream I
> created ProtectSystem= and ProtectHome= to hide the directory lists
> away.
> 
> The problem with these lists is that we need to be really conservative
> with them, since many of the libs we use (including glibc) use
> different interface "behind our back", and thus writing generally
> valid rules, that work on all archs, on all distros, on all
> glibc/libary versions is hard. This is what SELinux policy does, but I
> *really* didn't want to create another implementation of such a
> finegrained policy, if you follow what I mean.

I'm not sure. Shouldn't we then ship a SELinux policy file then? Would
you be interested in AppArmor profile for timesyncd, I have one? Also,
if a distro uses weird SELinux policies or setuid helpers at every
possible opportunity, shouldn't they have some responsibility of fixing
their setup?

> 
>> Install a system call filter, generated with 'strace -c'.
> 
> Hmm, I am a bit concerned about this part, I am not sure we can really
> do this upstream. Different glibc and other library versions we use
> invoke different syscalls. Moreover, different archs use different
> syscalls, too. This makes it really difficult putting together syscall
> filters upsteram that work generically, on all downstream versions.
> 
> I think SystemCallFilter= is something that end-users can use on their
> systems, but I am not sure we can use this upstream. :-(

Right, this was generated on x86_64.

> 
>> Only one additional process is needed.
> 
> Hmm, LimitNPROC=2 unfortunately doesn't do what we want it to do:
> since the daemon drops the to the systemd-timesync user on its own,
> PID 1 invokes it as root, not as the the systemd-timesync user, which
> makes the excercise pointless... We should probably handle this better
> though, and warn, instead of silently accepting it. Added that to the
> TODO list now.

This probably applies to other limits too?

> 
> I have now changed the timesyncd code to do the right thing on its own
> after dropping privs. During runtime RLIMIT_NPROC=2 should now be set.
> 
>> Mounts should not propagate back, so set the MountFlags to slave
>> (actually default since we use e.g. PrivateTmp).
> 
> This is implied by PrivateTmp=, indeed, hence unnecessary.
> 
> Lennart
>