[PATCH] drm/gem: Expose the buffer object handle to userspace last

Thu Feb 23 09:17:54 UTC 2023

Am 20.02.23 um 11:23 schrieb Tvrtko Ursulin:
>
> On 20/02/2023 10:01, Christian König wrote:
>> Am 20.02.23 um 10:55 schrieb Tvrtko Ursulin:
>>>
>>> Hi,
>>>
>>> On 14/02/2023 13:59, Christian König wrote:
>>>> Am 14.02.23 um 13:50 schrieb Tvrtko Ursulin:
>>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>>
>>>>> Currently drm_gem_handle_create_tail exposes the handle to userspace
>>>>> before the buffer object constructions is complete. This allowing
>>>>> of working against a partially constructed object, which may also 
>>>>> be in
>>>>> the process of having its creation fail, can have a range of negative
>>>>> outcomes.
>>>>>
>>>>> A lot of those will depend on what the individual drivers are 
>>>>> doing in
>>>>> their obj->funcs->open() callbacks, and also with a common failure 
>>>>> mode
>>>>> being -ENOMEM from drm_vma_node_allow.
>>>>>
>>>>> We can make sure none of this can happen by allocating a handle last,
>>>>> although with a downside that more of the function now runs under the
>>>>> dev->object_name_lock.
>>>>>
>>>>> Looking into the individual drivers open() hooks, we have
>>>>> amdgpu_gem_object_open which seems like it could have a potential 
>>>>> security
>>>>> issue without this change.
>>>>>
>>>>> A couple drivers like qxl_gem_object_open and vmw_gem_object_open
>>>>> implement no-op hooks so no impact for them.
>>>>>
>>>>> A bunch of other require a deeper look by individual owners to 
>>>>> asses for
>>>>> impact. Those are lima_gem_object_open, nouveau_gem_object_open,
>>>>> panfrost_gem_open, radeon_gem_object_open and 
>>>>> virtio_gpu_gem_object_open.
>>>>>
>>>>> Putting aside the risk assesment of the above, some common 
>>>>> scenarios to
>>>>> think about are along these lines:
>>>>>
>>>>> 1)
>>>>> Userspace closes a handle by speculatively "guessing" it from a 
>>>>> second
>>>>> thread.
>>>>>
>>>>> This results in an unreachable buffer object so, a memory leak.
>>>>>
>>>>> 2)
>>>>> Same as 1), but object is in the process of getting closed (failed
>>>>> creation).
>>>>>
>>>>> The second thread is then able to re-cycle the handle and 
>>>>> idr_remove would
>>>>> in the first thread would then remove the handle it does not own 
>>>>> from the
>>>>> idr.
>>>>>
>>>>> 3)
>>>>> Going back to the earlier per driver problem space - individual 
>>>>> impact
>>>>> assesment of allowing a second thread to access and operate on a 
>>>>> partially
>>>>> constructed handle / object. (Can something crash? Leak information?)
>>>>>
>>>>> In terms of identifying when the problem started I will tag some 
>>>>> patches
>>>>> as references, but not all, if even any, of them actually point to a
>>>>> broken state. I am just identifying points at which more 
>>>>> opportunity for
>>>>> issues to arise was added.
>>>>
>>>> Yes I've looked into this once as well, but couldn't completely 
>>>> solve it for some reason.
>>>>
>>>> Give me a day or two to get this tested and all the logic swapped 
>>>> back into my head again.
>>>
>>> Managed to recollect what the problem with earlier attempts was?
>>
>> Nope, that's way to long ago. I can only assume that I ran into 
>> problems with the object_name_lock.
>>
>> Probably best to double check if that doesn't result in a lock 
>> inversion when somebody grabs the reservation lock in their ->load() 
>> callback.
>
> Hmm I don't immediately follow the connection. But I have only found 
> radeon_driver_load_kms as using the load callback. Is there any 
> lockdep enabled CI for that driver which could tell us if there is a 
> problem there?

We don't have CI for radeon and most likely never will, that hw is just 
to old. But we also have amdgpu_gem_object_open() and that looks 
suspiciously like trouble.

The function makes sure that every BO is registered in the VM house 
keeping functions of the drm_file and while doing so grabs a few locks. 
I'm not sure what the locking order of those are.

Could be that this will work, could be that it breaks. I will ping 
internally today if somebody from my team can take a look at this patch.

Regards,
Christian.

>
> Regards,
>
> Tvrtko
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> Christian.
>>>>
>>>>>
>>>>> References: 304eda32920b ("drm/gem: add hooks to notify driver 
>>>>> when object handle is created/destroyed")
>>>>> References: ca481c9b2a3a ("drm/gem: implement vma access management")
>>>>> References: b39b5394fabc ("drm/gem: Add drm_gem_object_funcs")
>>>>> Cc: dri-devel at lists.freedesktop.org
>>>>> Cc: Rob Clark <robdclark at chromium.org>
>>>>> Cc: Ben Skeggs <bskeggs at redhat.com>
>>>>> Cc: David Herrmann <dh.herrmann at gmail.com>
>>>>> Cc: Noralf Trønnes <noralf at tronnes.org>
>>>>> Cc: David Airlie <airlied at gmail.com>
>>>>> Cc: Daniel Vetter <daniel at ffwll.ch>
>>>>> Cc: amd-gfx at lists.freedesktop.org
>>>>> Cc: lima at lists.freedesktop.org
>>>>> Cc: nouveau at lists.freedesktop.org
>>>>> Cc: Steven Price <steven.price at arm.com>
>>>>> Cc: virtualization at lists.linux-foundation.org
>>>>> Cc: spice-devel at lists.freedesktop.org
>>>>> Cc: Zack Rusin <zackr at vmware.com>
>>>>> ---
>>>>>   drivers/gpu/drm/drm_gem.c | 48 
>>>>> +++++++++++++++++++--------------------
>>>>>   1 file changed, 24 insertions(+), 24 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
>>>>> index aa15c52ae182..e3d897bca0f2 100644
>>>>> --- a/drivers/gpu/drm/drm_gem.c
>>>>> +++ b/drivers/gpu/drm/drm_gem.c
>>>>> @@ -356,52 +356,52 @@ drm_gem_handle_create_tail(struct drm_file 
>>>>> *file_priv,
>>>>>                  u32 *handlep)
>>>>>   {
>>>>>       struct drm_device *dev = obj->dev;
>>>>> -    u32 handle;
>>>>>       int ret;
>>>>> WARN_ON(!mutex_is_locked(&dev->object_name_lock));
>>>>>       if (obj->handle_count++ == 0)
>>>>>           drm_gem_object_get(obj);
>>>>> +    ret = drm_vma_node_allow(&obj->vma_node, file_priv);
>>>>> +    if (ret)
>>>>> +        goto err_put;
>>>>> +
>>>>> +    if (obj->funcs->open) {
>>>>> +        ret = obj->funcs->open(obj, file_priv);
>>>>> +        if (ret)
>>>>> +            goto err_revoke;
>>>>> +    }
>>>>> +
>>>>>       /*
>>>>> -     * Get the user-visible handle using idr.  Preload and perform
>>>>> -     * allocation under our spinlock.
>>>>> +     * Get the user-visible handle using idr as the _last_ step.
>>>>> +     * Preload and perform allocation under our spinlock.
>>>>>        */
>>>>>       idr_preload(GFP_KERNEL);
>>>>>       spin_lock(&file_priv->table_lock);
>>>>> -
>>>>>       ret = idr_alloc(&file_priv->object_idr, obj, 1, 0, GFP_NOWAIT);
>>>>> -
>>>>>       spin_unlock(&file_priv->table_lock);
>>>>>       idr_preload_end();
>>>>> -    mutex_unlock(&dev->object_name_lock);
>>>>>       if (ret < 0)
>>>>> -        goto err_unref;
>>>>> -
>>>>> -    handle = ret;
>>>>> +        goto err_close;
>>>>> -    ret = drm_vma_node_allow(&obj->vma_node, file_priv);
>>>>> -    if (ret)
>>>>> -        goto err_remove;
>>>>> +    mutex_unlock(&dev->object_name_lock);
>>>>> -    if (obj->funcs->open) {
>>>>> -        ret = obj->funcs->open(obj, file_priv);
>>>>> -        if (ret)
>>>>> -            goto err_revoke;
>>>>> -    }
>>>>> +    *handlep = ret;
>>>>> -    *handlep = handle;
>>>>>       return 0;
>>>>> +err_close:
>>>>> +    if (obj->funcs->close)
>>>>> +        obj->funcs->close(obj, file_priv);
>>>>>   err_revoke:
>>>>>       drm_vma_node_revoke(&obj->vma_node, file_priv);
>>>>> -err_remove:
>>>>> -    spin_lock(&file_priv->table_lock);
>>>>> -    idr_remove(&file_priv->object_idr, handle);
>>>>> -    spin_unlock(&file_priv->table_lock);
>>>>> -err_unref:
>>>>> -    drm_gem_object_handle_put_unlocked(obj);
>>>>> +err_put:
>>>>> +    if (--obj->handle_count == 0)
>>>>> +        drm_gem_object_put(obj);
>>>>> +
>>>>> +    mutex_unlock(&dev->object_name_lock);
>>>>> +
>>>>>       return ret;
>>>>>   }
>>>>
>>