[PATCH v5 8/9] drm/amdgpu: validate userq activity status for GEM_VA unmap
Liang, Prike
Prike.Liang at amd.com
Tue Jul 8 09:28:03 UTC 2025
[Public]
Regards,
Prike
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig at amd.com>
> Sent: Monday, July 7, 2025 5:43 PM
> To: Liang, Prike <Prike.Liang at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
> Subject: Re: [PATCH v5 8/9] drm/amdgpu: validate userq activity status for GEM_VA
> unmap
>
> On 04.07.25 12:33, Prike Liang wrote:
> > The userq VA unmap requires validating queue status before unamapping
> > it, if user tries to unmap a busy userq by GEM VA IOCTL then the
> > driver should report an error for this illegal usage.
>
> Clear NAK to the whole approach.
>
> We should never deny unmapping a VA because it is used by an userqueue. This
> can cause a rat tail of problems in userspace.
>
> Instead we *must* suspend the userqueue when the VA is unmapped and deny
> resuming it.
>
> I think we can do that by adjusting the usage of the eviction fence for the BOs used
> by the user queue.
I'm not sure understand correctly, do we need a notifier for the userq VA unmap event
or at amdgpu_vm_bo_unmap() directly invokes a new function of canceling the userq restore work and marking the userq state as hang to avoid further using it.
Or only do the userq VA unmap check at the userq restore work? If not, could you give more detail on this solution?
> Regards,
> Christian.
>
> >
> > Signed-off-by: Prike Liang <Prike.Liang at amd.com>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 16 +++++++++++++---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++++
> > 2 files changed, 22 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> > index 30838e5279bd..221292b6417a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> > @@ -281,7 +281,7 @@ amdgpu_userq_map_helper(struct amdgpu_userq_mgr
> *uq_mgr,
> > return r;
> > }
> >
> > -static void
> > +static int
> > amdgpu_userq_wait_for_last_fence(struct amdgpu_userq_mgr *uq_mgr,
> > struct amdgpu_usermode_queue *queue) { @@ -
> 290,10 +290,14 @@
> > amdgpu_userq_wait_for_last_fence(struct amdgpu_userq_mgr *uq_mgr,
> >
> > if (f && !dma_fence_is_signaled(f)) {
> > ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
> > - if (ret <= 0)
> > + if (ret <= 0) {
> > drm_file_err(uq_mgr->file, "Timed out waiting for
> fence=%llu:%llu\n",
> > f->context, f->seqno);
> > + return -ETIMEDOUT;
> > + }
> > }
> > +
> > + return 0;
> > }
> >
> > static void
> > @@ -509,7 +513,13 @@ amdgpu_userq_destroy(struct drm_file *filp, int
> queue_id)
> > mutex_unlock(&uq_mgr->userq_mutex);
> > return -EINVAL;
> > }
> > - amdgpu_userq_wait_for_last_fence(uq_mgr, queue);
> > +
> > + if (amdgpu_userq_wait_for_last_fence(uq_mgr, queue)) {
> > + drm_warn(adev_to_drm(uq_mgr->adev), "Don't destroy a busy
> userq\n");
> > + /* For the fence signal timeout case, it requires resetting the busy
> queue.*/
> > + r = -ETIMEDOUT;
> > + }
> > +
> > r = amdgpu_bo_reserve(queue->db_obj.obj, true);
> > if (!r) {
> > amdgpu_bo_unpin(queue->db_obj.obj);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > index f042372d9f2e..c8cdd668a8f2 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > @@ -1929,6 +1929,7 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device
> *adev,
> > struct amdgpu_bo_va_mapping *mapping;
> > struct amdgpu_vm *vm = bo_va->base.vm;
> > bool valid = true;
> > + int r;
> >
> > saddr /= AMDGPU_GPU_PAGE_SIZE;
> >
> > @@ -1949,6 +1950,14 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device
> *adev,
> > return -ENOENT;
> > }
> >
> > + /* It's unlikely to happen that the mapping userq hasn't been idled
> > + * during user requests GEM unmap IOCTL except for forcing the unmap
> > + * from user space.
> > + */
> > + r = amdgpu_userq_gem_va_unmap_validate(vm, saddr);
> > + if (unlikely(r && r != -EBUSY))
> > + dev_warn(adev->dev, "Here should be an improper unmap request
> from
> > +user space\n");
> > +
> > list_del(&mapping->list);
> > amdgpu_vm_it_remove(mapping, &vm->va);
> > mapping->bo_va = NULL;
More information about the amd-gfx
mailing list