[systemd-devel] [PATCH] Apply ProtectSystem to non-merged /usr directories

Tue Oct 21 06:57:10 PDT 2014

Am 2014-10-21 14:28, schrieb Lennart Poettering:
> We explicitly make no
> assumptions on /opt because nobody knows right now what it is 
> supposed
> to be...

Sure, I wasn't disputing that point.

> Same for /usr, /bin, /sbin, and the other stuff Martin#s
> patch added: we cannot make assumptions about it, it might be (and is
> in real life) set up in different ways, and we don't want to be in
> that game.

That's why I didn't suggest that you (as upstream) should be in that
game, but that distributions and administrators should be able to do so
themselves.

>> Therefore, may I suggest to make this configurable in
>> /etc/systemd/system.conf: [...]
>> If you're willing to accept a patch for this, I'd provide one.
>
> I really disagree that this would be a good idea. We should give 
> clear
> guidelines how things should be set up here to take full benefit of
> the functionality. Because this is about an agreement between the OS
> people and the upstream developers of packages to run on the OS. We
> want to make sure they can make the assumptions everywhere, which are
> not configurable and behave differently everywhere. For example, I
> really want that let's say apache sets ProtectSystem= and can be sure
> it will just not break things on our OSes. And because of that its
> impact should be only on the safe subset, and it should be the same 
> on
> all installations, and not be subject to configuration.

Debian's systemd package currently includes a variant of Martin's patch
that does include additional directories. So your point that
ProtectSystem= does the same thing on every distro is already not true.

Of course, if you make it configurable, people can shoot themselves in
the foot. But you already have a ton of global options in system.conf
that can break a lot of software if people do stupid stuff with it:

  - set a global CapabilityBoundingSet= that's very restrictive
       - either the system doesn't boot at all because some essential
         stuff is completely missing
       - or it boots but some services don't work because they rely
         on certain things
  - set SystemCallArchitectures=native when non-native software is
    installed
  - set DefaultEnvironment=LD_TRACE_LOADED_OBJECTS=1 or the such
  - set DefaultTimeoutStartSec=1 to break any unit that takes longer 
than
    1 second to start but doesn't set an own timeout because it assumes
    the default timeout is sane
  - DefaultLimitCPU=1
  - ...

Also, to go to your apache example: it's not clear that ProtectSystem=
just making /usr readonly doesn't break things: I have seen
DocumentRoots beneath /usr in the wild (/usr/local/www or the such),
with people running dynamic webapps that had to write into that tree. 
If
you then upgrade from a package version that did not include
ProtectSystem= (perhaps because it only included a SysV init script) to
a package version that does include ProtectSystem=, things will break.

I actually agree with your sentiment of having an agreement between
upstream developers and the core OS - I just think I would like to
interpret the matter a little differently:

To me, ProtectSystem= is supposed to be a protection of all the files
     a) installed on a system (not created by the user)
and
     b) not subject to modification by typical services. (i.e. not a
        cache / status file / ...)
For distros with /usr-move, this falls back to /usr and /boot (and /etc
if =full). For other distros, there may be a few additional 
directories.
And on a custom installation, it may include additional directories,
such as /opt.

If I am an upstream developer and ship a unit file with
ProtectSystem=full, my expectations are that normal operation on
directories that are supposed to contain data that is not put there at
installation (such as /var, /tmp, /home, /srv, /run, ...) remain
accessible, but that systemd will provide an extra layer of security
around the rest of the installed system. As an upstream developer, I
don't know where the distro the user is using decided to install stuff,
whether it's in /usr or directly in /bin or wherever. And I don't 
really
care about that and I don't WANT to care about that. If I really only
want /usr to be read-only, I could just add ReadOnlyDirectories=/usr to
the unit file and be done with it. But I don't want to care about 
distro
details as an upstream developer, I want the setting to just work[tm]
and do the right thing[tm]. The fact that it is a generically named
option makes me actually expect an abstraction of distro differences.

On the other hand, if I put my administrator hat on, as I said in the
last mail, I will know what directories may be present additionally 
that
could also fall under that umbrella. And if something breaks because I
put in a stupid setting, that's my bad. By all means, put a big fat
warning in the docs that this setting is dangerous to fiddle around 
with.

I do see a good bit of orthogonality here:

  - Upstream developers can clearly expect that /usr-_type_ (!) stuff
    is protected by this setting and don't have to care about minute
    details.

  - You provide some sane initial defaults (just /usr and /boot)

  - Distros have the ability to refine that to their specific needs
    (/lib, /bin, /sbin)

  - Administrators have the further ability to make adjustments w.r.t.
    their local installation.

Alternatively, in the current state, if I want to have the same level 
of
protection:

  - As a distro I must either
        - patch systemd (that's what Debian is currently doing)
        - add ReadOnlyDirectories=... to every service file with
          ProtectSystem=

  - As an administrator I must
        - add drop-ins for every service with ProtectSystem= to add
          ReadOnlyDirectories=...
        - on every update recheck whether a service changed its
          ProtectSystem=... setting and adjust accordingly

That means that currently, ProtectSystem= is only a shortcut for
ReadOnlyDirectories=/usr -/boot [/etc]
But since the name itself is much more abstract, it could be so much
more useful to bridge cross-distro differences.

Finally, I just want add a quick note that this kind of abstraction is 
a
good thing. The fact that you don't expose the kernel internals of 
mount
namespaces and read only bind mounts directly but have a semantically
much more sound ReadOnlyDirectories= setting is a good example of that.
To me, whether ls is in /usr/bin or /bin is a detail, the same way that
the internal implementation of ReadOnlyDirectories= is - if details
about the kernel interface change at some point, service unit writers
don't have to care about that.

Anyway, sorry for the long email, I just wanted to lay out the case
better. Please think about this. If you are still completely against 
it,
I'll not press the issue.

Christian