[PATCH] drm/amdgpu: fix drm leases being broken on radv

Koenig, Christian Christian.Koenig at amd.com
Wed Apr 17 09:18:17 UTC 2019


Am 17.04.19 um 10:10 schrieb Daniel Vetter:
> On Wed, Apr 17, 2019 at 09:09:27AM +0200, Christian König wrote:
>> Am 16.04.19 um 23:40 schrieb Andres Rodriguez:
>>> After a recent commit, access to the DRM_AUTH ioctls become more
>>> permissive. This resulted in a buggy check for drm_master capabilities
>>> inside radv stop working.
>>>
>>> This commit adds a backwards compatibility workaround so that the radv
>>> drm_master check keeps working as previously expected.
>>>
>>> This fixes SteamVR and other drm lease based apps being broken since
>>> v5.1-rc1
>>>
>>>   From the usermode side, radv will also be fixed to properly test for
>>> master capabilities:
>>> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/669
>>>
>>> Fixes: 8059add0478e ("drm: allow render capable master with DRM_AUTH ioctls")
>>> Signed-off-by: Andres Rodriguez <andresx7 at gmail.com>
>>> Cc: Daniel Vetter <daniel at ffwll.ch>
>>> Cc: David Airlie <airlied at linux.ie>
>>> Cc: Emil Velikov <emil.velikov at collabora.com>
>>> Cc: Alex Deucher <alexander.deucher at amd.com>
>>> Cc: Keith Packard <keithp at keithp.com>
>>> Reviewed-by: Keith Packard <keithp at keithp.com>
>>> Reviewed-by: Daniel Vetter <daniel at ffwll.ch>
>> Well definitely a NAK. IIRC we have unit tests where the exactly first thing
>> they do is querying AMDGPU_INFO_ACCEL_WORKING.
>>
>> And I definitely not going to risk breaking those just to fix buggy behavior
>> in radv.
> s/buggy/fragile :-)
>
> Option B would be to disable 8059add0478e ("drm: allow render capable
> master with DRM_AUTH ioctls") just for amdgpu.ko, but that's a bit more
> tricky to implement I guess (since it'd leak all over the place).
>
> Option C is to revert 8059add04, but that's a bit silly since it seems to
> work everywhere.
>
> Breaking radv isn't an option, because no regression. Aside: No one is
> stopping amd folks from reviewing radv patches and making sure there's no
> fragile stuff in there.
>
> We discussed this quite a bit on irc with Ben and Keith and others, and
> figured option A is the most promising to go forward with. Anything using
> amdgpu_device_init (which I think are all the umds, but I didn't check)
> will keep working, as will radv leases/vkdisplay, plus we can keep
> 8059add04 for everyone (not just everony except amdgpu). If that means
> breaking a few unit tests ... *shrugh*. Needs patches to fix them ofc, but
> shouldn't be that much work really.

I think you miss the point here: This patch will break the render node 
interface!

E.g. take another look at those lines of code:
> +		if (fpriv->is_first_ioctl && !filp->is_master)
> +			return -EACCES;
This will return -EACCES on the first accel working query which is 
doesn't comes from the DRM master. And it is irrelevant if its the 
primary or the render node.

So this patch most likely breaks tons of things and is *definitely* a 
complete NAK.

Additional to that IIRC we have (rather old) code which behaves 
differently depending if it is called by the render node or the primary 
node. In other words it assumes that the primary node is authenticated.

If I understood correctly what 8059add04 does than this is NOT a good to 
do in general.

Regards,
Christian.

> -Daniel
>
>> Christian.
>>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  3 +++
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 ++
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 22 ++++++++++++++++++++++
>>>    3 files changed, 27 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index 8d0d7f3dd5fb..e67bfee8d411 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -409,6 +409,9 @@ struct amdgpu_fpriv {
>>>    	struct mutex		bo_list_lock;
>>>    	struct idr		bo_list_handles;
>>>    	struct amdgpu_ctx_mgr	ctx_mgr;
>>> +
>>> +	/* Part of a backwards compatibility hack */
>>> +	bool			is_first_ioctl;
>>>    };
>>>    int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> index 7419ea8a388b..cd438afa7092 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> @@ -1128,6 +1128,7 @@ long amdgpu_drm_ioctl(struct file *filp,
>>>    		      unsigned int cmd, unsigned long arg)
>>>    {
>>>    	struct drm_file *file_priv = filp->private_data;
>>> +	struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
>>>    	struct drm_device *dev;
>>>    	long ret;
>>>    	dev = file_priv->minor->dev;
>>> @@ -1136,6 +1137,7 @@ long amdgpu_drm_ioctl(struct file *filp,
>>>    		return ret;
>>>    	ret = drm_ioctl(filp, cmd, arg);
>>> +	fpriv->is_first_ioctl = false;
>>>    	pm_runtime_mark_last_busy(dev->dev);
>>>    	pm_runtime_put_autosuspend(dev->dev);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>> index e860412043bb..00c0a2fb2a69 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>> @@ -455,6 +455,7 @@ static int amdgpu_hw_ip_info(struct amdgpu_device *adev,
>>>    static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
>>>    {
>>>    	struct amdgpu_device *adev = dev->dev_private;
>>> +	struct amdgpu_fpriv *fpriv = filp->driver_priv;
>>>    	struct drm_amdgpu_info *info = data;
>>>    	struct amdgpu_mode_info *minfo = &adev->mode_info;
>>>    	void __user *out = (void __user *)(uintptr_t)info->return_pointer;
>>> @@ -470,6 +471,26 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file
>>>    	switch (info->query) {
>>>    	case AMDGPU_INFO_ACCEL_WORKING:
>>> +		/* HACK: radv has a fragile open-coded check for drm master
>>> +		 * The pattern for the call is:
>>> +		 *     open(DRM_NODE_PRIMARY)
>>> +		 *     drmModeCommandWrite( query=AMDGPU_INFO_ACCEL_WORKING )
>>> +		 *
>>> +		 * After commit 8059add04 this check stopped working due to
>>> +		 * operations on a primary node from non-master clients becoming
>>> +		 * more permissive.
>>> +		 *
>>> +		 * This backwards compatibility hack targets the calling pattern
>>> +		 * above to fail radv from thinking it has master access to the
>>> +		 * display system ( without reverting 8059add04 ).
>>> +		 *
>>> +		 * Users of libdrm will not be affected as some other ioctls are
>>> +		 * performed between open() and the AMDGPU_INFO_ACCEL_WORKING
>>> +		 * query.
>>> +		 */
>>> +		if (fpriv->is_first_ioctl && !filp->is_master)
>>> +			return -EACCES;
>>> +
>>>    		ui32 = adev->accel_working;
>>>    		return copy_to_user(out, &ui32, min(size, 4u)) ? -EFAULT : 0;
>>>    	case AMDGPU_INFO_CRTC_FROM_ID:
>>> @@ -987,6 +1008,7 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
>>>    			goto error_vm;
>>>    	}
>>> +	fpriv->is_first_ioctl = true;
>>>    	mutex_init(&fpriv->bo_list_lock);
>>>    	idr_init(&fpriv->bo_list_handles);
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel



More information about the dri-devel mailing list