[PATCH] drm/amdgpu: fix drm leases being broken on radv

Daniel Vetter daniel at ffwll.ch
Wed Apr 17 11:06:24 UTC 2019


On Wed, Apr 17, 2019 at 12:29 PM Koenig, Christian
<Christian.Koenig at amd.com> wrote:
>
> Hi guys,
>
> well going back to the beginning once more because something doesn't fit
> together here.
>
> I mean what exactly is the purpose of 8059add0478e "drm: allow render
> capable master with DRM_AUTH ioctls"?
>
> When I read the code correctly the effect is that we ignore the DRM_AUTH
> flag when also the DRM_RENDER_ALLOW flag is set and the driver is a
> render node driver.
>
> Well when this is correct then why the heck aren't we removing the
> DRM_AUTH flag from the affected IOCTLs instead?
>
> The only common IOCTLs affecting this are DRM_IOCTL_PRIME_HANDLE_TO_FD
> and DRM_IOCTL_PRIME_FD_TO_HANDLE and those are protected by capability
> flags separately.
>
> So the patch 8059add0478e doesn't make any sense as far as I can tell.
> What I'm missing?

The idea is that if your driver supports render node (i.e. isolates
unpriviledged clients), then we could allow the same on primary nodes.
There's a bunch of userspace which blindly only opens primary nodes
(instead of also opening render nodes), and this makes them work
without requiring root or similar. Afaiui the main offender is libva
being really dense (and semi-abandonded, partially also because intel
has changed for the worse in that area).

Note that the capability checks are for all ioctl, including driver
private ones, i.e. rendering will Just Work. I think it makes sense to
have that.

I guess there might be 100% overlap between DRM_AUTH and
DRM_RENDER_ALLOW in ioctl flags now, but fixing that would mean
auditing all the driver ioctl tables. I guess we could add that as a
todo.rst item. It would make sense that either the driver is isolating
clients sufficiently to not need authenticated clients only, or allow
them all.

I guess we could lament why rendernodes are still not yet fully rolled
out everywhere, despite all these years. But then I stopped being
surprised about ecosystem inertia, and in the end a strict split
between display (done on the primary) and rendering (preferrably done
on render nodes) is not required on intel/amd/nv, so not many care to
make this work better. The patch from Emil seemed like a neat little
trick to get there faster, until the radv hiccup.
-Daniel

> Regards,
> Christian.
>
> Am 17.04.19 um 11:18 schrieb Christian König:
> > Am 17.04.19 um 10:10 schrieb Daniel Vetter:
> >> On Wed, Apr 17, 2019 at 09:09:27AM +0200, Christian König wrote:
> >>> Am 16.04.19 um 23:40 schrieb Andres Rodriguez:
> >>>> After a recent commit, access to the DRM_AUTH ioctls become more
> >>>> permissive. This resulted in a buggy check for drm_master capabilities
> >>>> inside radv stop working.
> >>>>
> >>>> This commit adds a backwards compatibility workaround so that the radv
> >>>> drm_master check keeps working as previously expected.
> >>>>
> >>>> This fixes SteamVR and other drm lease based apps being broken since
> >>>> v5.1-rc1
> >>>>
> >>>>   From the usermode side, radv will also be fixed to properly test for
> >>>> master capabilities:
> >>>> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/669
> >>>>
> >>>> Fixes: 8059add0478e ("drm: allow render capable master with
> >>>> DRM_AUTH ioctls")
> >>>> Signed-off-by: Andres Rodriguez <andresx7 at gmail.com>
> >>>> Cc: Daniel Vetter <daniel at ffwll.ch>
> >>>> Cc: David Airlie <airlied at linux.ie>
> >>>> Cc: Emil Velikov <emil.velikov at collabora.com>
> >>>> Cc: Alex Deucher <alexander.deucher at amd.com>
> >>>> Cc: Keith Packard <keithp at keithp.com>
> >>>> Reviewed-by: Keith Packard <keithp at keithp.com>
> >>>> Reviewed-by: Daniel Vetter <daniel at ffwll.ch>
> >>> Well definitely a NAK. IIRC we have unit tests where the exactly
> >>> first thing
> >>> they do is querying AMDGPU_INFO_ACCEL_WORKING.
> >>>
> >>> And I definitely not going to risk breaking those just to fix buggy
> >>> behavior
> >>> in radv.
> >> s/buggy/fragile :-)
> >>
> >> Option B would be to disable 8059add0478e ("drm: allow render capable
> >> master with DRM_AUTH ioctls") just for amdgpu.ko, but that's a bit more
> >> tricky to implement I guess (since it'd leak all over the place).
> >>
> >> Option C is to revert 8059add04, but that's a bit silly since it
> >> seems to
> >> work everywhere.
> >>
> >> Breaking radv isn't an option, because no regression. Aside: No one is
> >> stopping amd folks from reviewing radv patches and making sure
> >> there's no
> >> fragile stuff in there.
> >>
> >> We discussed this quite a bit on irc with Ben and Keith and others, and
> >> figured option A is the most promising to go forward with. Anything
> >> using
> >> amdgpu_device_init (which I think are all the umds, but I didn't check)
> >> will keep working, as will radv leases/vkdisplay, plus we can keep
> >> 8059add04 for everyone (not just everony except amdgpu). If that means
> >> breaking a few unit tests ... *shrugh*. Needs patches to fix them
> >> ofc, but
> >> shouldn't be that much work really.
> >
> > I think you miss the point here: This patch will break the render node
> > interface!
> >
> > E.g. take another look at those lines of code:
> >> +        if (fpriv->is_first_ioctl && !filp->is_master)
> >> +            return -EACCES;
> > This will return -EACCES on the first accel working query which is
> > doesn't comes from the DRM master. And it is irrelevant if its the
> > primary or the render node.
> >
> > So this patch most likely breaks tons of things and is *definitely* a
> > complete NAK.
> >
> > Additional to that IIRC we have (rather old) code which behaves
> > differently depending if it is called by the render node or the
> > primary node. In other words it assumes that the primary node is
> > authenticated.
> >
> > If I understood correctly what 8059add04 does than this is NOT a good
> > to do in general.
> >
> > Regards,
> > Christian.
> >
> >> -Daniel
> >>
> >>> Christian.
> >>>
> >>>> ---
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  3 +++
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 ++
> >>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 22 ++++++++++++++++++++++
> >>>>    3 files changed, 27 insertions(+)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>>> index 8d0d7f3dd5fb..e67bfee8d411 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>>> @@ -409,6 +409,9 @@ struct amdgpu_fpriv {
> >>>>        struct mutex        bo_list_lock;
> >>>>        struct idr        bo_list_handles;
> >>>>        struct amdgpu_ctx_mgr    ctx_mgr;
> >>>> +
> >>>> +    /* Part of a backwards compatibility hack */
> >>>> +    bool            is_first_ioctl;
> >>>>    };
> >>>>    int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv
> >>>> **fpriv);
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>> index 7419ea8a388b..cd438afa7092 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>> @@ -1128,6 +1128,7 @@ long amdgpu_drm_ioctl(struct file *filp,
> >>>>                  unsigned int cmd, unsigned long arg)
> >>>>    {
> >>>>        struct drm_file *file_priv = filp->private_data;
> >>>> +    struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
> >>>>        struct drm_device *dev;
> >>>>        long ret;
> >>>>        dev = file_priv->minor->dev;
> >>>> @@ -1136,6 +1137,7 @@ long amdgpu_drm_ioctl(struct file *filp,
> >>>>            return ret;
> >>>>        ret = drm_ioctl(filp, cmd, arg);
> >>>> +    fpriv->is_first_ioctl = false;
> >>>>        pm_runtime_mark_last_busy(dev->dev);
> >>>>        pm_runtime_put_autosuspend(dev->dev);
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> >>>> index e860412043bb..00c0a2fb2a69 100644
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> >>>> @@ -455,6 +455,7 @@ static int amdgpu_hw_ip_info(struct
> >>>> amdgpu_device *adev,
> >>>>    static int amdgpu_info_ioctl(struct drm_device *dev, void *data,
> >>>> struct drm_file *filp)
> >>>>    {
> >>>>        struct amdgpu_device *adev = dev->dev_private;
> >>>> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
> >>>>        struct drm_amdgpu_info *info = data;
> >>>>        struct amdgpu_mode_info *minfo = &adev->mode_info;
> >>>>        void __user *out = (void __user
> >>>> *)(uintptr_t)info->return_pointer;
> >>>> @@ -470,6 +471,26 @@ static int amdgpu_info_ioctl(struct drm_device
> >>>> *dev, void *data, struct drm_file
> >>>>        switch (info->query) {
> >>>>        case AMDGPU_INFO_ACCEL_WORKING:
> >>>> +        /* HACK: radv has a fragile open-coded check for drm master
> >>>> +         * The pattern for the call is:
> >>>> +         *     open(DRM_NODE_PRIMARY)
> >>>> +         *     drmModeCommandWrite( query=AMDGPU_INFO_ACCEL_WORKING )
> >>>> +         *
> >>>> +         * After commit 8059add04 this check stopped working due to
> >>>> +         * operations on a primary node from non-master clients
> >>>> becoming
> >>>> +         * more permissive.
> >>>> +         *
> >>>> +         * This backwards compatibility hack targets the calling
> >>>> pattern
> >>>> +         * above to fail radv from thinking it has master access
> >>>> to the
> >>>> +         * display system ( without reverting 8059add04 ).
> >>>> +         *
> >>>> +         * Users of libdrm will not be affected as some other
> >>>> ioctls are
> >>>> +         * performed between open() and the AMDGPU_INFO_ACCEL_WORKING
> >>>> +         * query.
> >>>> +         */
> >>>> +        if (fpriv->is_first_ioctl && !filp->is_master)
> >>>> +            return -EACCES;
> >>>> +
> >>>>            ui32 = adev->accel_working;
> >>>>            return copy_to_user(out, &ui32, min(size, 4u)) ? -EFAULT
> >>>> : 0;
> >>>>        case AMDGPU_INFO_CRTC_FROM_ID:
> >>>> @@ -987,6 +1008,7 @@ int amdgpu_driver_open_kms(struct drm_device
> >>>> *dev, struct drm_file *file_priv)
> >>>>                goto error_vm;
> >>>>        }
> >>>> +    fpriv->is_first_ioctl = true;
> >>>>        mutex_init(&fpriv->bo_list_lock);
> >>>>        idr_init(&fpriv->bo_list_handles);
> >>> _______________________________________________
> >>> dri-devel mailing list
> >>> dri-devel at lists.freedesktop.org
> >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the dri-devel mailing list