[systemd-devel] [PATCH] timesyncd: tighten unit file

Tue Jan 27 13:58:20 PST 2015

On 01/27/15 21:16, Lennart Poettering wrote:
> On Tue, 27.01.15 19:47, Topi Miettinen (toiwoton at gmail.com) wrote:
> 
>> On 01/27/15 01:54, Lennart Poettering wrote:
>>> On Sun, 25.01.15 12:23, Topi Miettinen (toiwoton at gmail.com) wrote:
>>>
>>>> There's no need for CAP_CHOWN, CAP_DAC_OVERRIDE or CAP_FOWNER.
>>>
>>> Hmm, that's not true, is it? load_clock_timestamp() is invoked before
>>> we drop privs in the daemon. And it certainly calls fchmod() and
>>> fchown(), so that it can later on still access the clock touch file.
>>>
>>> What am I missing?
>>
>> It works for me because the file exists already with correct owner and
>> permissions. I'd put CAP_CHOWN and CAP_FOWNER back for now. 
> 
> Well, but this also means CAP_DAC_OVERRIDE is required, as otherwise
> we won't be able to access the file while we are still root, and have
> not dropped privileges anymore.
> 
> Note that timesyncd actually drops all remaining caps when changng
> to the "systemd-timesync" user, with the exception of
> CAP_SYS_TIME. This means that the caps configured in the unit file
> don't matter too much as they only are in effect for the short time
> when timesyncd is initializing and hasn't dropped all the remaining
> caps except for CAP_SYS_TIME yet.
> 
>> Maybe the clock file could reside in a separate subdirectory, then the
>> directory owner and permissions could be set up already at
>> installation?
> 
> Well, to enable stateless systems I think it is a good idea to write
> services in a way that they can rebuild what they need in /var on
> their own, should it be missing, so that /var can be flushed out and
> things will just work when the machine is then booted again.
> 
> All our daemons are written in a style where /var is reconstructed
> implicitly if it is missing, and timesyncd really should work the same
> way.

Nice, but the directories could be created with tmpfiles.d then?

> 
>>> I am not entirely sure how NoNewPrivileges= and "no-setuid-fixup"
>>> actually relate to each other. I.e. what does one do that the other
>>> doesn't? And if they do different things, isn't NoNewPrivileges= kinda
>>> a superset of no-setuid-fixup, and thus makes setting the latter
>>> redundant?
>>
>> I think NoNewPrivileges (prctl(PR_SET_NO_NEW_PRIVS)) prevents any kind
>> of uid/gid changes, while no-setuid-fixup does not restrict uid change
>> but does not elevate the capability set.
> 
> The PR_SET_NO_NEW_PRIVS docs say explicitly it only applies to
> execve(). 
> 
> To be honest, I am still very puzzled about this. The docs are awful
> about this...
> 
>>> Reading through Documentation/prctl/no_new_privs.txt in the
>>> kernel sources I found this paragraph:
>>>
>>>        "Be careful, though: LSMs might also not tighten constraints on
>>>        exec in no_new_privs mode.  (This means that setting up a
>>>        general-purpose service launcher to set no_new_privs before
>>>        execing daemons may interfere with LSM-based sandboxing.)"
>>>
>>> This kinda suggests that SELinux and friends might not like
>>> NoNewPrivileges=, hence I am not entirely sure it is really the right
>>> option for us.
>>>
>>> Hence I am wondering if "no-setuid-fixup" (and -locked) might be the
>>> better option for us to enable everywhere?
>>
>> I don't think timesyncd is executing any helper programs, especially
>> set-uid or with file system capabilities, so both should work fine.
> 
> Well, the way I read the paragraph above if we set PR_SET_NO_NEW_PRIVS
> after forking, but before execing systemd-timesyncd, and that binary
> has an selinux label on it, that the selinux label would be ignore,
> and things would continue to run with the same label as we forked
> off. This pretty much renders SELinux usesless, since all services
> where we set the bit for would then run as "init_t"... and that's
> something we really shouldn't do.

seccomp_filter.txt on the other hand says that
"Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or
run with CAP_SYS_ADMIN privileges in its namespace.  If these are not
true, -EACCES will be returned.  This requirement ensures that filter
programs cannot be applied to child processes with greater privileges
than the task that installed them."

Does this mean that SELinux and seccomp are mutually exclusive? Awful
design if so.

> 
>>> The problem with these lists is that we need to be really conservative
>>> with them, since many of the libs we use (including glibc) use
>>> different interface "behind our back", and thus writing generally
>>> valid rules, that work on all archs, on all distros, on all
>>> glibc/libary versions is hard. This is what SELinux policy does, but I
>>> *really* didn't want to create another implementation of such a
>>> finegrained policy, if you follow what I mean.
>>
>> I'm not sure. Shouldn't we then ship a SELinux policy file then? Would
>> you be interested in AppArmor profile for timesyncd, I have one? Also,
>> if a distro uses weird SELinux policies or setuid helpers at every
>> possible opportunity, shouldn't they have some responsibility of fixing
>> their setup?
> 
> Well, SELinux policy is managed in a central selinux policy database
> that is shipped in one big RPM. My selinux-fu is not good enough to
> maintain the policy file in systemd, and i am not sure this even is
> generic enough to be able to ship the same selinux policy that works
> across all distros that do selinux.
> 
> If Apparmor policies are standardized and stand-alone enough, and
> there's a clear way to install them, and you are willing to look after
> it, then yes, I'd merge a patch that adds apparmor profiles to systemd
> upstream.

Well, I'm no expert on AppArmor policies. This is what I have:

#include <tunables/global>

/lib/systemd/systemd-timesyncd {
  #include <abstractions/nameservice>

  capability setgid,
  capability setuid,
  capability sys_time,
  capability setpcap,

  /etc/ld.so.cache r,
  /etc/systemd/timesyncd.conf r,
  /lib/systemd/systemd-timesyncd mr,
  /lib/x86_64-linux-gnu/libattr.so.* mr,
  /lib/x86_64-linux-gnu/libc-*.so mr,
  /lib/x86_64-linux-gnu/libcap.so.* mr,
  /lib/x86_64-linux-gnu/libm-*.so mr,
  /lib/x86_64-linux-gnu/libnsl-*.so mr,
  /lib/x86_64-linux-gnu/libpthread-*.so mr,
  /lib/x86_64-linux-gnu/libresolv-*.so mr,
  /proc/cmdline r,
  /proc/sys/kernel/random/boot_id r,
  /run/systemd/netif/state r,
  /var/lib/systemd/clock w,
}

The x86_64-linux-gnu part looks x86-64 and Debian-specific to me.

> 
>>> Hmm, LimitNPROC=2 unfortunately doesn't do what we want it to do:
>>> since the daemon drops the to the systemd-timesync user on its own,
>>> PID 1 invokes it as root, not as the the systemd-timesync user, which
>>> makes the excercise pointless... We should probably handle this better
>>> though, and warn, instead of silently accepting it. Added that to the
>>> TODO list now.
>>
>> This probably applies to other limits too?
> 
> RLIMIT_NPROC is kinda special. While the other resource limits are
> per-process, this one is actual per-user. This means that if we apply
> the other limits to the process we forked off, where we will invoke
> execve() in next, then this will stay locl to that process and be
> properly inherited into the final daemon.
> 
> Or in other words: the other resource limits are exposed nicely in
> systemd, and are already useful. It's just RLIMIT_NPROC that is a bit
> useless currently...
> 
> Lennart
>