[PATCH] drm/amdgpu: fix drm leases being broken on radv

Koenig, Christian Christian.Koenig at amd.com
Wed Apr 17 11:18:35 UTC 2019


Am 17.04.19 um 13:06 schrieb Daniel Vetter:
> On Wed, Apr 17, 2019 at 12:29 PM Koenig, Christian
> <Christian.Koenig at amd.com> wrote:
>> Hi guys,
>>
>> well going back to the beginning once more because something doesn't fit
>> together here.
>>
>> I mean what exactly is the purpose of 8059add0478e "drm: allow render
>> capable master with DRM_AUTH ioctls"?
>>
>> When I read the code correctly the effect is that we ignore the DRM_AUTH
>> flag when also the DRM_RENDER_ALLOW flag is set and the driver is a
>> render node driver.
>>
>> Well when this is correct then why the heck aren't we removing the
>> DRM_AUTH flag from the affected IOCTLs instead?
>>
>> The only common IOCTLs affecting this are DRM_IOCTL_PRIME_HANDLE_TO_FD
>> and DRM_IOCTL_PRIME_FD_TO_HANDLE and those are protected by capability
>> flags separately.
>>
>> So the patch 8059add0478e doesn't make any sense as far as I can tell.
>> What I'm missing?
> The idea is that if your driver supports render node (i.e. isolates
> unpriviledged clients), then we could allow the same on primary nodes.
> There's a bunch of userspace which blindly only opens primary nodes
> (instead of also opening render nodes), and this makes them work
> without requiring root or similar. Afaiui the main offender is libva
> being really dense (and semi-abandonded, partially also because intel
> has changed for the worse in that area).

And exactly that's the problem, we can't do this. It's not only radv 
which breaks, but also older libdrm_amdgpu versions as well as a bunch 
of other stuff.

Key problem is that older libdrm_amdgpu code did exactly the same thing 
and called an IOCTL to check if a handle is authenticated or not. If I'm 
not completely mistaken the radv code could actually be copy&pasted from 
there.

Additional to that we might run into problems with the libdrm unit 
tests, but those probably didn't break obviously because they are almost 
all the time run as root.

> Note that the capability checks are for all ioctl, including driver
> private ones, i.e. rendering will Just Work. I think it makes sense to
> have that.
>
> I guess there might be 100% overlap between DRM_AUTH and
> DRM_RENDER_ALLOW in ioctl flags now, but fixing that would mean
> auditing all the driver ioctl tables. I guess we could add that as a
> todo.rst item. It would make sense that either the driver is isolating
> clients sufficiently to not need authenticated clients only, or allow
> them all.
>
> I guess we could lament why rendernodes are still not yet fully rolled
> out everywhere, despite all these years. But then I stopped being
> surprised about ecosystem inertia, and in the end a strict split
> between display (done on the primary) and rendering (preferrably done
> on render nodes) is not required on intel/amd/nv, so not many care to
> make this work better. The patch from Emil seemed like a neat little
> trick to get there faster, until the radv hiccup.

Well that is a clear NAK for that approach.

If you have problems with libva (which are design problems in libva and 
not kernel related at all) then please work around that by removing 
DRM_AUTH from the affected IOCTLs and not by messing up general 
authentication code.

Well, what you guys did here is a serious no-go.

Christian.

> -Daniel
>
>> Regards,
>> Christian.
>>
>> Am 17.04.19 um 11:18 schrieb Christian König:
>>> Am 17.04.19 um 10:10 schrieb Daniel Vetter:
>>>> On Wed, Apr 17, 2019 at 09:09:27AM +0200, Christian König wrote:
>>>>> Am 16.04.19 um 23:40 schrieb Andres Rodriguez:
>>>>>> After a recent commit, access to the DRM_AUTH ioctls become more
>>>>>> permissive. This resulted in a buggy check for drm_master capabilities
>>>>>> inside radv stop working.
>>>>>>
>>>>>> This commit adds a backwards compatibility workaround so that the radv
>>>>>> drm_master check keeps working as previously expected.
>>>>>>
>>>>>> This fixes SteamVR and other drm lease based apps being broken since
>>>>>> v5.1-rc1
>>>>>>
>>>>>>    From the usermode side, radv will also be fixed to properly test for
>>>>>> master capabilities:
>>>>>> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/669
>>>>>>
>>>>>> Fixes: 8059add0478e ("drm: allow render capable master with
>>>>>> DRM_AUTH ioctls")
>>>>>> Signed-off-by: Andres Rodriguez <andresx7 at gmail.com>
>>>>>> Cc: Daniel Vetter <daniel at ffwll.ch>
>>>>>> Cc: David Airlie <airlied at linux.ie>
>>>>>> Cc: Emil Velikov <emil.velikov at collabora.com>
>>>>>> Cc: Alex Deucher <alexander.deucher at amd.com>
>>>>>> Cc: Keith Packard <keithp at keithp.com>
>>>>>> Reviewed-by: Keith Packard <keithp at keithp.com>
>>>>>> Reviewed-by: Daniel Vetter <daniel at ffwll.ch>
>>>>> Well definitely a NAK. IIRC we have unit tests where the exactly
>>>>> first thing
>>>>> they do is querying AMDGPU_INFO_ACCEL_WORKING.
>>>>>
>>>>> And I definitely not going to risk breaking those just to fix buggy
>>>>> behavior
>>>>> in radv.
>>>> s/buggy/fragile :-)
>>>>
>>>> Option B would be to disable 8059add0478e ("drm: allow render capable
>>>> master with DRM_AUTH ioctls") just for amdgpu.ko, but that's a bit more
>>>> tricky to implement I guess (since it'd leak all over the place).
>>>>
>>>> Option C is to revert 8059add04, but that's a bit silly since it
>>>> seems to
>>>> work everywhere.
>>>>
>>>> Breaking radv isn't an option, because no regression. Aside: No one is
>>>> stopping amd folks from reviewing radv patches and making sure
>>>> there's no
>>>> fragile stuff in there.
>>>>
>>>> We discussed this quite a bit on irc with Ben and Keith and others, and
>>>> figured option A is the most promising to go forward with. Anything
>>>> using
>>>> amdgpu_device_init (which I think are all the umds, but I didn't check)
>>>> will keep working, as will radv leases/vkdisplay, plus we can keep
>>>> 8059add04 for everyone (not just everony except amdgpu). If that means
>>>> breaking a few unit tests ... *shrugh*. Needs patches to fix them
>>>> ofc, but
>>>> shouldn't be that much work really.
>>> I think you miss the point here: This patch will break the render node
>>> interface!
>>>
>>> E.g. take another look at those lines of code:
>>>> +        if (fpriv->is_first_ioctl && !filp->is_master)
>>>> +            return -EACCES;
>>> This will return -EACCES on the first accel working query which is
>>> doesn't comes from the DRM master. And it is irrelevant if its the
>>> primary or the render node.
>>>
>>> So this patch most likely breaks tons of things and is *definitely* a
>>> complete NAK.
>>>
>>> Additional to that IIRC we have (rather old) code which behaves
>>> differently depending if it is called by the render node or the
>>> primary node. In other words it assumes that the primary node is
>>> authenticated.
>>>
>>> If I understood correctly what 8059add04 does than this is NOT a good
>>> to do in general.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> -Daniel
>>>>
>>>>> Christian.
>>>>>
>>>>>> ---
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  3 +++
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 ++
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 22 ++++++++++++++++++++++
>>>>>>     3 files changed, 27 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> index 8d0d7f3dd5fb..e67bfee8d411 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> @@ -409,6 +409,9 @@ struct amdgpu_fpriv {
>>>>>>         struct mutex        bo_list_lock;
>>>>>>         struct idr        bo_list_handles;
>>>>>>         struct amdgpu_ctx_mgr    ctx_mgr;
>>>>>> +
>>>>>> +    /* Part of a backwards compatibility hack */
>>>>>> +    bool            is_first_ioctl;
>>>>>>     };
>>>>>>     int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv
>>>>>> **fpriv);
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> index 7419ea8a388b..cd438afa7092 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> @@ -1128,6 +1128,7 @@ long amdgpu_drm_ioctl(struct file *filp,
>>>>>>                   unsigned int cmd, unsigned long arg)
>>>>>>     {
>>>>>>         struct drm_file *file_priv = filp->private_data;
>>>>>> +    struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
>>>>>>         struct drm_device *dev;
>>>>>>         long ret;
>>>>>>         dev = file_priv->minor->dev;
>>>>>> @@ -1136,6 +1137,7 @@ long amdgpu_drm_ioctl(struct file *filp,
>>>>>>             return ret;
>>>>>>         ret = drm_ioctl(filp, cmd, arg);
>>>>>> +    fpriv->is_first_ioctl = false;
>>>>>>         pm_runtime_mark_last_busy(dev->dev);
>>>>>>         pm_runtime_put_autosuspend(dev->dev);
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>>>> index e860412043bb..00c0a2fb2a69 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>>>> @@ -455,6 +455,7 @@ static int amdgpu_hw_ip_info(struct
>>>>>> amdgpu_device *adev,
>>>>>>     static int amdgpu_info_ioctl(struct drm_device *dev, void *data,
>>>>>> struct drm_file *filp)
>>>>>>     {
>>>>>>         struct amdgpu_device *adev = dev->dev_private;
>>>>>> +    struct amdgpu_fpriv *fpriv = filp->driver_priv;
>>>>>>         struct drm_amdgpu_info *info = data;
>>>>>>         struct amdgpu_mode_info *minfo = &adev->mode_info;
>>>>>>         void __user *out = (void __user
>>>>>> *)(uintptr_t)info->return_pointer;
>>>>>> @@ -470,6 +471,26 @@ static int amdgpu_info_ioctl(struct drm_device
>>>>>> *dev, void *data, struct drm_file
>>>>>>         switch (info->query) {
>>>>>>         case AMDGPU_INFO_ACCEL_WORKING:
>>>>>> +        /* HACK: radv has a fragile open-coded check for drm master
>>>>>> +         * The pattern for the call is:
>>>>>> +         *     open(DRM_NODE_PRIMARY)
>>>>>> +         *     drmModeCommandWrite( query=AMDGPU_INFO_ACCEL_WORKING )
>>>>>> +         *
>>>>>> +         * After commit 8059add04 this check stopped working due to
>>>>>> +         * operations on a primary node from non-master clients
>>>>>> becoming
>>>>>> +         * more permissive.
>>>>>> +         *
>>>>>> +         * This backwards compatibility hack targets the calling
>>>>>> pattern
>>>>>> +         * above to fail radv from thinking it has master access
>>>>>> to the
>>>>>> +         * display system ( without reverting 8059add04 ).
>>>>>> +         *
>>>>>> +         * Users of libdrm will not be affected as some other
>>>>>> ioctls are
>>>>>> +         * performed between open() and the AMDGPU_INFO_ACCEL_WORKING
>>>>>> +         * query.
>>>>>> +         */
>>>>>> +        if (fpriv->is_first_ioctl && !filp->is_master)
>>>>>> +            return -EACCES;
>>>>>> +
>>>>>>             ui32 = adev->accel_working;
>>>>>>             return copy_to_user(out, &ui32, min(size, 4u)) ? -EFAULT
>>>>>> : 0;
>>>>>>         case AMDGPU_INFO_CRTC_FROM_ID:
>>>>>> @@ -987,6 +1008,7 @@ int amdgpu_driver_open_kms(struct drm_device
>>>>>> *dev, struct drm_file *file_priv)
>>>>>>                 goto error_vm;
>>>>>>         }
>>>>>> +    fpriv->is_first_ioctl = true;
>>>>>>         mutex_init(&fpriv->bo_list_lock);
>>>>>>         idr_init(&fpriv->bo_list_handles);
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>



More information about the dri-devel mailing list