[PATCH] drm/gem: Expose the buffer object handle to userspace last
Christian König
christian.koenig at amd.com
Thu Feb 23 09:17:54 UTC 2023
Am 20.02.23 um 11:23 schrieb Tvrtko Ursulin:
>
> On 20/02/2023 10:01, Christian König wrote:
>> Am 20.02.23 um 10:55 schrieb Tvrtko Ursulin:
>>>
>>> Hi,
>>>
>>> On 14/02/2023 13:59, Christian König wrote:
>>>> Am 14.02.23 um 13:50 schrieb Tvrtko Ursulin:
>>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>>
>>>>> Currently drm_gem_handle_create_tail exposes the handle to userspace
>>>>> before the buffer object constructions is complete. This allowing
>>>>> of working against a partially constructed object, which may also
>>>>> be in
>>>>> the process of having its creation fail, can have a range of negative
>>>>> outcomes.
>>>>>
>>>>> A lot of those will depend on what the individual drivers are
>>>>> doing in
>>>>> their obj->funcs->open() callbacks, and also with a common failure
>>>>> mode
>>>>> being -ENOMEM from drm_vma_node_allow.
>>>>>
>>>>> We can make sure none of this can happen by allocating a handle last,
>>>>> although with a downside that more of the function now runs under the
>>>>> dev->object_name_lock.
>>>>>
>>>>> Looking into the individual drivers open() hooks, we have
>>>>> amdgpu_gem_object_open which seems like it could have a potential
>>>>> security
>>>>> issue without this change.
>>>>>
>>>>> A couple drivers like qxl_gem_object_open and vmw_gem_object_open
>>>>> implement no-op hooks so no impact for them.
>>>>>
>>>>> A bunch of other require a deeper look by individual owners to
>>>>> asses for
>>>>> impact. Those are lima_gem_object_open, nouveau_gem_object_open,
>>>>> panfrost_gem_open, radeon_gem_object_open and
>>>>> virtio_gpu_gem_object_open.
>>>>>
>>>>> Putting aside the risk assesment of the above, some common
>>>>> scenarios to
>>>>> think about are along these lines:
>>>>>
>>>>> 1)
>>>>> Userspace closes a handle by speculatively "guessing" it from a
>>>>> second
>>>>> thread.
>>>>>
>>>>> This results in an unreachable buffer object so, a memory leak.
>>>>>
>>>>> 2)
>>>>> Same as 1), but object is in the process of getting closed (failed
>>>>> creation).
>>>>>
>>>>> The second thread is then able to re-cycle the handle and
>>>>> idr_remove would
>>>>> in the first thread would then remove the handle it does not own
>>>>> from the
>>>>> idr.
>>>>>
>>>>> 3)
>>>>> Going back to the earlier per driver problem space - individual
>>>>> impact
>>>>> assesment of allowing a second thread to access and operate on a
>>>>> partially
>>>>> constructed handle / object. (Can something crash? Leak information?)
>>>>>
>>>>> In terms of identifying when the problem started I will tag some
>>>>> patches
>>>>> as references, but not all, if even any, of them actually point to a
>>>>> broken state. I am just identifying points at which more
>>>>> opportunity for
>>>>> issues to arise was added.
>>>>
>>>> Yes I've looked into this once as well, but couldn't completely
>>>> solve it for some reason.
>>>>
>>>> Give me a day or two to get this tested and all the logic swapped
>>>> back into my head again.
>>>
>>> Managed to recollect what the problem with earlier attempts was?
>>
>> Nope, that's way to long ago. I can only assume that I ran into
>> problems with the object_name_lock.
>>
>> Probably best to double check if that doesn't result in a lock
>> inversion when somebody grabs the reservation lock in their ->load()
>> callback.
>
> Hmm I don't immediately follow the connection. But I have only found
> radeon_driver_load_kms as using the load callback. Is there any
> lockdep enabled CI for that driver which could tell us if there is a
> problem there?
We don't have CI for radeon and most likely never will, that hw is just
to old. But we also have amdgpu_gem_object_open() and that looks
suspiciously like trouble.
The function makes sure that every BO is registered in the VM house
keeping functions of the drm_file and while doing so grabs a few locks.
I'm not sure what the locking order of those are.
Could be that this will work, could be that it breaks. I will ping
internally today if somebody from my team can take a look at this patch.
Regards,
Christian.
>
> Regards,
>
> Tvrtko
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> Christian.
>>>>
>>>>>
>>>>> References: 304eda32920b ("drm/gem: add hooks to notify driver
>>>>> when object handle is created/destroyed")
>>>>> References: ca481c9b2a3a ("drm/gem: implement vma access management")
>>>>> References: b39b5394fabc ("drm/gem: Add drm_gem_object_funcs")
>>>>> Cc: dri-devel at lists.freedesktop.org
>>>>> Cc: Rob Clark <robdclark at chromium.org>
>>>>> Cc: Ben Skeggs <bskeggs at redhat.com>
>>>>> Cc: David Herrmann <dh.herrmann at gmail.com>
>>>>> Cc: Noralf Trønnes <noralf at tronnes.org>
>>>>> Cc: David Airlie <airlied at gmail.com>
>>>>> Cc: Daniel Vetter <daniel at ffwll.ch>
>>>>> Cc: amd-gfx at lists.freedesktop.org
>>>>> Cc: lima at lists.freedesktop.org
>>>>> Cc: nouveau at lists.freedesktop.org
>>>>> Cc: Steven Price <steven.price at arm.com>
>>>>> Cc: virtualization at lists.linux-foundation.org
>>>>> Cc: spice-devel at lists.freedesktop.org
>>>>> Cc: Zack Rusin <zackr at vmware.com>
>>>>> ---
>>>>> drivers/gpu/drm/drm_gem.c | 48
>>>>> +++++++++++++++++++--------------------
>>>>> 1 file changed, 24 insertions(+), 24 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
>>>>> index aa15c52ae182..e3d897bca0f2 100644
>>>>> --- a/drivers/gpu/drm/drm_gem.c
>>>>> +++ b/drivers/gpu/drm/drm_gem.c
>>>>> @@ -356,52 +356,52 @@ drm_gem_handle_create_tail(struct drm_file
>>>>> *file_priv,
>>>>> u32 *handlep)
>>>>> {
>>>>> struct drm_device *dev = obj->dev;
>>>>> - u32 handle;
>>>>> int ret;
>>>>> WARN_ON(!mutex_is_locked(&dev->object_name_lock));
>>>>> if (obj->handle_count++ == 0)
>>>>> drm_gem_object_get(obj);
>>>>> + ret = drm_vma_node_allow(&obj->vma_node, file_priv);
>>>>> + if (ret)
>>>>> + goto err_put;
>>>>> +
>>>>> + if (obj->funcs->open) {
>>>>> + ret = obj->funcs->open(obj, file_priv);
>>>>> + if (ret)
>>>>> + goto err_revoke;
>>>>> + }
>>>>> +
>>>>> /*
>>>>> - * Get the user-visible handle using idr. Preload and perform
>>>>> - * allocation under our spinlock.
>>>>> + * Get the user-visible handle using idr as the _last_ step.
>>>>> + * Preload and perform allocation under our spinlock.
>>>>> */
>>>>> idr_preload(GFP_KERNEL);
>>>>> spin_lock(&file_priv->table_lock);
>>>>> -
>>>>> ret = idr_alloc(&file_priv->object_idr, obj, 1, 0, GFP_NOWAIT);
>>>>> -
>>>>> spin_unlock(&file_priv->table_lock);
>>>>> idr_preload_end();
>>>>> - mutex_unlock(&dev->object_name_lock);
>>>>> if (ret < 0)
>>>>> - goto err_unref;
>>>>> -
>>>>> - handle = ret;
>>>>> + goto err_close;
>>>>> - ret = drm_vma_node_allow(&obj->vma_node, file_priv);
>>>>> - if (ret)
>>>>> - goto err_remove;
>>>>> + mutex_unlock(&dev->object_name_lock);
>>>>> - if (obj->funcs->open) {
>>>>> - ret = obj->funcs->open(obj, file_priv);
>>>>> - if (ret)
>>>>> - goto err_revoke;
>>>>> - }
>>>>> + *handlep = ret;
>>>>> - *handlep = handle;
>>>>> return 0;
>>>>> +err_close:
>>>>> + if (obj->funcs->close)
>>>>> + obj->funcs->close(obj, file_priv);
>>>>> err_revoke:
>>>>> drm_vma_node_revoke(&obj->vma_node, file_priv);
>>>>> -err_remove:
>>>>> - spin_lock(&file_priv->table_lock);
>>>>> - idr_remove(&file_priv->object_idr, handle);
>>>>> - spin_unlock(&file_priv->table_lock);
>>>>> -err_unref:
>>>>> - drm_gem_object_handle_put_unlocked(obj);
>>>>> +err_put:
>>>>> + if (--obj->handle_count == 0)
>>>>> + drm_gem_object_put(obj);
>>>>> +
>>>>> + mutex_unlock(&dev->object_name_lock);
>>>>> +
>>>>> return ret;
>>>>> }
>>>>
>>
More information about the amd-gfx
mailing list