[PATCH] drm/amdgpu: fix drm leases being broken on radv

Wed Apr 17 12:00:42 UTC 2019

On Wed, Apr 17, 2019 at 11:18:35AM +0000, Koenig, Christian wrote:
> Am 17.04.19 um 13:06 schrieb Daniel Vetter:
> > On Wed, Apr 17, 2019 at 12:29 PM Koenig, Christian
> > <Christian.Koenig at amd.com> wrote:
> >> Hi guys,
> >>
> >> well going back to the beginning once more because something doesn't fit
> >> together here.
> >>
> >> I mean what exactly is the purpose of 8059add0478e "drm: allow render
> >> capable master with DRM_AUTH ioctls"?
> >>
> >> When I read the code correctly the effect is that we ignore the DRM_AUTH
> >> flag when also the DRM_RENDER_ALLOW flag is set and the driver is a
> >> render node driver.
> >>
> >> Well when this is correct then why the heck aren't we removing the
> >> DRM_AUTH flag from the affected IOCTLs instead?
> >>
> >> The only common IOCTLs affecting this are DRM_IOCTL_PRIME_HANDLE_TO_FD
> >> and DRM_IOCTL_PRIME_FD_TO_HANDLE and those are protected by capability
> >> flags separately.
> >>
> >> So the patch 8059add0478e doesn't make any sense as far as I can tell.
> >> What I'm missing?
> > The idea is that if your driver supports render node (i.e. isolates
> > unpriviledged clients), then we could allow the same on primary nodes.
> > There's a bunch of userspace which blindly only opens primary nodes
> > (instead of also opening render nodes), and this makes them work
> > without requiring root or similar. Afaiui the main offender is libva
> > being really dense (and semi-abandonded, partially also because intel
> > has changed for the worse in that area).
> 
> And exactly that's the problem, we can't do this. It's not only radv 
> which breaks, but also older libdrm_amdgpu versions as well as a bunch 
> of other stuff.
> 
> Key problem is that older libdrm_amdgpu code did exactly the same thing 
> and called an IOCTL to check if a handle is authenticated or not. If I'm 
> not completely mistaken the radv code could actually be copy&pasted from 
> there.
> 
> Additional to that we might run into problems with the libdrm unit 
> tests, but those probably didn't break obviously because they are almost 
> all the time run as root.

I looked at the very first version of amdgpu_device_initialize. That still
does some other ioctl calls before querying AMDGPU_INFO_ACCEL_WORKING. So
should work even on the very first amdgpu based userspace release.

I also quickly checked for these unit tests, and at least the ones in
libdrm don't have a call to AMDGPU_INFO_ACCEL_WORKING directly after
opening the file. If you want I can also crawl through rocm, amdvlk and
ddx sources.

> > Note that the capability checks are for all ioctl, including driver
> > private ones, i.e. rendering will Just Work. I think it makes sense to
> > have that.
> >
> > I guess there might be 100% overlap between DRM_AUTH and
> > DRM_RENDER_ALLOW in ioctl flags now, but fixing that would mean
> > auditing all the driver ioctl tables. I guess we could add that as a
> > todo.rst item. It would make sense that either the driver is isolating
> > clients sufficiently to not need authenticated clients only, or allow
> > them all.
> >
> > I guess we could lament why rendernodes are still not yet fully rolled
> > out everywhere, despite all these years. But then I stopped being
> > surprised about ecosystem inertia, and in the end a strict split
> > between display (done on the primary) and rendering (preferrably done
> > on render nodes) is not required on intel/amd/nv, so not many care to
> > make this work better. The patch from Emil seemed like a neat little
> > trick to get there faster, until the radv hiccup.
> 
> Well that is a clear NAK for that approach.
> 
> If you have problems with libva (which are design problems in libva and 
> not kernel related at all) then please work around that by removing 
> DRM_AUTH from the affected IOCTLs and not by messing up general 
> authentication code.

That's what we've done defacto, except we've done it for all drivers. As
mentioned above, option C would be to disable that for amdgpu. Which seems
silly, given that the workaround hack is just 3 lines.

I also think that the goal of the original patch is pretty sound and
overall well motivated, given the reality that rendernodes aren't super
popular unfortunately. Also, from an access permission pov it's the right
thing to do really.

> Well, what you guys did here is a serious no-go.

Not really understanding your reasons for an unconditional nak on all this
still.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch