[PATCH v5 8/9] drm/amdgpu: validate userq activity status for GEM_VA unmap

Tue Jul 8 10:18:27 UTC 2025

On 08.07.25 11:28, Liang, Prike wrote:
> [Public]
> 
> Regards,
>       Prike
> 
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig at amd.com>
>> Sent: Monday, July 7, 2025 5:43 PM
>> To: Liang, Prike <Prike.Liang at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
>> Subject: Re: [PATCH v5 8/9] drm/amdgpu: validate userq activity status for GEM_VA
>> unmap
>>
>> On 04.07.25 12:33, Prike Liang wrote:
>>> The userq VA unmap requires validating queue status before unamapping
>>> it, if user tries to unmap a busy userq by GEM VA IOCTL then the
>>> driver should report an error for this illegal usage.
>>
>> Clear NAK to the whole approach.
>>
>> We should never deny unmapping a VA because it is used by an userqueue. This
>> can cause a rat tail of problems in userspace.
>>
>> Instead we *must* suspend the userqueue when the VA is unmapped and deny
>> resuming it.
>>
>> I think we can do that by adjusting the usage of the eviction fence for the BOs used
>> by the user queue.
> I'm not sure understand correctly, do we need a notifier for the userq VA unmap event
> or at amdgpu_vm_bo_unmap() directly invokes a new function of canceling the userq restore work and marking the userq state as hang to avoid further using it.
> Or only do the userq VA unmap check at the userq restore work? If not, could you give more detail on this solution?

I haven't fully flashed out the solution either.

What we need is a) that amdgpu_vm_bo_unmap() waits for the eviction fence of the user queues to signal which suspends the user queue and b) that the starting/resuming the user queue validates the VA addresses they use and makes sure that the eviction fence is publicated.

This way the queue never starts or resumes while the VAs are not valid.

Regards,
Christian.

> 
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Prike Liang <Prike.Liang at amd.com>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 16 +++++++++++++---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  9 +++++++++
>>>  2 files changed, 22 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>>> index 30838e5279bd..221292b6417a 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>>> @@ -281,7 +281,7 @@ amdgpu_userq_map_helper(struct amdgpu_userq_mgr
>> *uq_mgr,
>>>     return r;
>>>  }
>>>
>>> -static void
>>> +static int
>>>  amdgpu_userq_wait_for_last_fence(struct amdgpu_userq_mgr *uq_mgr,
>>>                              struct amdgpu_usermode_queue *queue)  { @@ -
>> 290,10 +290,14 @@
>>> amdgpu_userq_wait_for_last_fence(struct amdgpu_userq_mgr *uq_mgr,
>>>
>>>     if (f && !dma_fence_is_signaled(f)) {
>>>             ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
>>> -           if (ret <= 0)
>>> +           if (ret <= 0) {
>>>                     drm_file_err(uq_mgr->file, "Timed out waiting for
>> fence=%llu:%llu\n",
>>>                                  f->context, f->seqno);
>>> +                   return -ETIMEDOUT;
>>> +           }
>>>     }
>>> +
>>> +   return 0;
>>>  }
>>>
>>>  static void
>>> @@ -509,7 +513,13 @@ amdgpu_userq_destroy(struct drm_file *filp, int
>> queue_id)
>>>             mutex_unlock(&uq_mgr->userq_mutex);
>>>             return -EINVAL;
>>>     }
>>> -   amdgpu_userq_wait_for_last_fence(uq_mgr, queue);
>>> +
>>> +   if (amdgpu_userq_wait_for_last_fence(uq_mgr, queue)) {
>>> +           drm_warn(adev_to_drm(uq_mgr->adev), "Don't destroy a busy
>> userq\n");
>>> +           /* For the fence signal timeout case, it requires resetting the busy
>> queue.*/
>>> +           r = -ETIMEDOUT;
>>> +   }
>>> +
>>>     r = amdgpu_bo_reserve(queue->db_obj.obj, true);
>>>     if (!r) {
>>>             amdgpu_bo_unpin(queue->db_obj.obj);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index f042372d9f2e..c8cdd668a8f2 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -1929,6 +1929,7 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device
>> *adev,
>>>     struct amdgpu_bo_va_mapping *mapping;
>>>     struct amdgpu_vm *vm = bo_va->base.vm;
>>>     bool valid = true;
>>> +   int r;
>>>
>>>     saddr /= AMDGPU_GPU_PAGE_SIZE;
>>>
>>> @@ -1949,6 +1950,14 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device
>> *adev,
>>>                     return -ENOENT;
>>>     }
>>>
>>> +   /* It's unlikely to happen that the mapping userq hasn't been idled
>>> +    * during user requests GEM unmap IOCTL except for forcing the unmap
>>> +    * from user space.
>>> +    */
>>> +   r = amdgpu_userq_gem_va_unmap_validate(vm, saddr);
>>> +   if (unlikely(r && r != -EBUSY))
>>> +           dev_warn(adev->dev, "Here should be an improper unmap request
>> from
>>> +user space\n");
>>> +
>>>     list_del(&mapping->list);
>>>     amdgpu_vm_it_remove(mapping, &vm->va);
>>>     mapping->bo_va = NULL;
>