[PATCH 1/8] drm/gem: Fix race in drm_gem_handle_create_tail()
Simona Vetter
simona.vetter at ffwll.ch
Wed May 28 09:26:20 UTC 2025
On Wed, May 28, 2025 at 11:12:59AM +0200, Simona Vetter wrote:
> Object creation is a careful dance where we must guarantee that the
> object is fully constructed before it is visible to other threads, and
> GEM buffer objects are no difference.
>
> Final publishing happens by calling drm_gem_handle_create(). After
> that the only allowed thing to do is call drm_gem_object_put() because
> a concurrent call to the GEM_CLOSE ioctl with a correctly guessed id
> (which is trivial since we have a linear allocator) can already tear
> down the object again.
>
> Luckily most drivers get this right, the very few exceptions I've
> pinged the relevant maintainers for. Unfortunately we also need
> drm_gem_handle_create() when creating additional handles for an
> already existing object (e.g. GETFB ioctl or the various bo import
> ioctl), and hence we cannot have a drm_gem_handle_create_and_put() as
> the only exported function to stop these issues from happening.
>
> Now unfortunately the implementation of drm_gem_handle_create() isn't
> living up to standards: It does correctly finishe object
> initialization at the global level, and hence is safe against a
> concurrent tear down. But it also sets up the file-private aspects of
> the handle, and that part goes wrong: We fully register the object in
> the drm_file.object_idr before calling drm_vma_node_allow() or
> obj->funcs->open, which opens up races against concurrent removal of
> that handle in drm_gem_handle_delete().
>
> Fix this with the usual two-stage approach of first reserving the
> handle id, and then only registering the object after we've completed
> the file-private setup.
>
> Jacek reported this with a testcase of concurrently calling GEM_CLOSE
> on a freshly-created object (which also destroys the object), but it
> should be possible to hit this with just additional handles created
> through import or GETFB without completed destroying the underlying
> object with the concurrent GEM_CLOSE ioctl calls.
>
> Note that the close-side of this race was fixed in f6cd7daecff5 ("drm:
> Release driver references to handle before making it available
> again"), which means a cool 9 years have passed until someone noticed
> that we need to make this symmetry or there's still gaps left :-/
> Without the 2-stage close approach we'd still have a race, therefore
> that's an integral part of this bugfix.
>
> More importantly, this means we can have NULL pointers behind
> allocated id in our drm_file.object_idr. We need to check for that
> now:
>
> - drm_gem_handle_delete() checks for ERR_OR_NULL already
>
> - drm_gem.c:object_lookup() also chekcs for NULL
>
> - drm_gem_release() should never be called if there's another thread
> still existing that could call into an IOCTL that creates a new
> handle, so cannot race. For paranoia I added a NULL check to
> drm_gem_object_release_handle() though.
>
> - most drivers (etnaviv, i915, msm) are find because they use
> idr_find, which maps both ENOENT and NULL to NULL.
>
> - vmgfx is already broken vmw_debugfs_gem_info_show() because NULL
> pointers might exist due to drm_gem_handle_delete(). This needs a
> separate patch. This is because idr_for_each_entry terminates on the
> first NULL entry and so might not iterate over everything.
>
> - similar for amd in amdgpu_debugfs_gem_info_show() and
> amdgpu_gem_force_release(). The latter is really questionable though
> since it's a best effort hack and there's no way to close all the
> races. Needs separate patches.
>
> - xe is really broken because it not uses idr_for_each_entry() but
> also drops the drm_file.table_lock, which can wreak the idr iterator
> state if you're unlucky enough. Maybe another reason to look into
> the drm fdinfo memory stats instead of hand-rolling too much.
>
> - drm_show_memory_stats() is also broken since it uses
> idr_for_each_entry. But since that's a preexisting bug I'll follow
> up with a separate patch.
I've already reworded the commit message locally since I now think
idr_for_each_entry is entirely fine.
-Sima
>
> Reported-by: Jacek Lawrynowicz <jacek.lawrynowicz at linux.intel.com>
> Cc: stable at vger.kernel.org
> Cc: Jacek Lawrynowicz <jacek.lawrynowicz at linux.intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
> Cc: Maxime Ripard <mripard at kernel.org>
> Cc: Thomas Zimmermann <tzimmermann at suse.de>
> Cc: David Airlie <airlied at gmail.com>
> Cc: Simona Vetter <simona at ffwll.ch>
> Signed-off-by: Simona Vetter <simona.vetter at intel.com>
> Signed-off-by: Simona Vetter <simona.vetter at ffwll.ch>
> ---
> drivers/gpu/drm/drm_gem.c | 10 +++++++++-
> include/drm/drm_file.h | 3 +++
> 2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index 1e659d2660f7..e4e20dda47b1 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -279,6 +279,9 @@ drm_gem_object_release_handle(int id, void *ptr, void *data)
> struct drm_file *file_priv = data;
> struct drm_gem_object *obj = ptr;
>
> + if (WARN_ON(!data))
> + return 0;
> +
> if (obj->funcs->close)
> obj->funcs->close(obj, file_priv);
>
> @@ -399,7 +402,7 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
> idr_preload(GFP_KERNEL);
> spin_lock(&file_priv->table_lock);
>
> - ret = idr_alloc(&file_priv->object_idr, obj, 1, 0, GFP_NOWAIT);
> + ret = idr_alloc(&file_priv->object_idr, NULL, 1, 0, GFP_NOWAIT);
>
> spin_unlock(&file_priv->table_lock);
> idr_preload_end();
> @@ -420,6 +423,11 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
> goto err_revoke;
> }
>
> + /* mirrors drm_gem_handle_delete to avoid races */
> + spin_lock(&file_priv->table_lock);
> + obj = idr_replace(&file_priv->object_idr, obj, handle);
> + WARN_ON(obj != NULL);
> + spin_unlock(&file_priv->table_lock);
> *handlep = handle;
> return 0;
>
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index 5c3b2aa3e69d..d344d41e6cfe 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -300,6 +300,9 @@ struct drm_file {
> *
> * Mapping of mm object handles to object pointers. Used by the GEM
> * subsystem. Protected by @table_lock.
> + *
> + * Note that allocated entries might be NULL as a transient state when
> + * creating or deleting a handle.
> */
> struct idr object_idr;
>
> --
> 2.49.0
>
--
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the dri-devel
mailing list