[PATCH v15 00/17] arm64: untag user pointers passed to the kernel

Tue May 21 18:29:33 UTC 2019

On Mon, May 20, 2019 at 04:53:07PM -0700, Evgenii Stepanov wrote:
> On Fri, May 17, 2019 at 7:49 AM Catalin Marinas <catalin.marinas at arm.com> wrote:
> > IMO (RFC for now), I see two ways forward:
> >
> > 1. Make this a user space problem and do not allow tagged pointers into
> >    the syscall ABI. A libc wrapper would have to convert structures,
> >    parameters before passing them into the kernel. Note that we can
> >    still support the hardware MTE in the kernel by enabling tagged
> >    memory ranges, saving/restoring tags etc. but not allowing tagged
> >    addresses at the syscall boundary.
> >
> > 2. Similar shim to the above libc wrapper but inside the kernel
> >    (arch/arm64 only; most pointer arguments could be covered with an
> >    __SC_CAST similar to the s390 one). There are two differences from
> >    what we've discussed in the past:
> >
> >    a) this is an opt-in by the user which would have to explicitly call
> >       prctl(). If it returns -ENOTSUPP etc., the user won't be allowed
> >       to pass tagged pointers to the kernel. This would probably be the
> >       responsibility of the C lib to make sure it doesn't tag heap
> >       allocations. If the user did not opt-in, the syscalls are routed
> >       through the normal path (no untagging address shim).
> >
> >    b) ioctl() and other blacklisted syscalls (prctl) will not accept
> >       tagged pointers (to be documented in Vicenzo's ABI patches).
[...]
> Any userspace shim approach is problematic for Android because of the
> apps that use raw system calls. AFAIK, all apps written in Go are in
> that camp - I'm not sure how common they are, but getting them all
> recompiled is probably not realistic.

That's a fair point (I wasn't expecting it would get much traction
anyway ;)). OTOH, it allows upstreaming of the MTE patches while we
continue the discussions around TBI.

> The way I see it, a patch that breaks handling of tagged pointers is
> not that different from, say, a patch that adds a wild pointer
> dereference. Both are bugs; the difference is that (a) the former
> breaks a relatively uncommon target and (b) it's arguably an easier
> mistake to make. If MTE adoption goes well, (a) will not be the case
> for long.

It's also the fact such patch would go unnoticed for a long time until
someone exercises that code path. And when they do, the user would be
pretty much in the dark trying to figure what what went wrong, why a
SIGSEGV or -EFAULT happened. What's worse, we can't even say we fixed
all the places where it matters in the current kernel codebase (ignoring
future patches).

I think we should revisit the static checking discussions we had last
year. Run-time checking (even with compiler instrumentation and
syzkaller fuzzing) would only cover the code paths specific to a Linux
or Android installation.

> This is a bit of a chicken-and-egg problem. In a world where memory
> allocators on one or several popular platforms generate pointers with
> non-zero tags, any such breakage will be caught in testing.
> Unfortunately to reach that state we need the kernel to start
> accepting tagged pointers first, and then hold on for a couple of
> years until userspace catches up.

Would the kernel also catch up with providing a stable ABI? Because we
have two moving targets.

On one hand, you have Android or some Linux distro that stick to a
stable kernel version for some time, so they have better chance of
clearing most of the problems. On the other hand, we have mainline
kernel that gets over 500K lines every release. As maintainer, I can't
rely on my testing alone as this is on a limited number of platforms. So
my concern is that every kernel release has a significant chance of
breaking the ABI, unless we have a better way of identifying potential
issues.

> Perhaps we can start by whitelisting ioctls by driver?

This was also raised by Ruben in private but without a (static) tool to
to check, manually going through all the drivers doesn't scale. It's
very likely that most drivers don't care, just a get_user/put_user is
already handled by these patches. Searching for find_vma() was
identifying one such use-case but is this sufficient? Are there other
cases we need to explicitly untag a pointer?

The other point I'd like feedback on is 2.a above. I see _some_ value
into having the user opt-in to this relaxed ABI rather than blinding
exposing it to all applications. Dave suggested (in private) a new
personality (e.g. PER_LINUX_TBI) inherited by children. It would be the
responsibility of the C library to check the current personality bits
and only tag pointers on allocation *if* the kernel allowed it. The
kernel could provide the AT_FLAGS bit as in Vincenzo's patches if the
personality was set but can't set it retrospectively if the user called
sys_personality. By default, /sbin/init would not have this personality
and libc would not tag pointers, so we can guarantee that your distro
boots normally with a new kernel version. We could have an envp that
gets caught by /sbin/init so you can pass it on the kernel command line
(or a dynamic loader at run-time). But the default should be the current
ABI behaviour.

We can enforce the current behaviour by having access_ok() check the
personality or a TIF flag but we may relax this enforcement at some
point in the future as we learn more about the implications of TBI.

Thanks.

-- 
Catalin