[PATCH 9/9] drm/amdgpu: validate userq activity status for GEM_VA unmap

Fri Jun 6 09:39:50 UTC 2025

On 6/6/25 09:51, Liang, Prike wrote:
> [Public]
> 
>> From: Alex Deucher <alexdeucher at gmail.com>
>> Sent: Saturday, May 31, 2025 5:54 AM
>> To: Liang, Prike <Prike.Liang at amd.com>
>> Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander
>> <Alexander.Deucher at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>;
>> Lazar, Lijo <Lijo.Lazar at amd.com>
>> Subject: Re: [PATCH 9/9] drm/amdgpu: validate userq activity status for GEM_VA
>> unmap
>>
>> On Fri, May 30, 2025 at 4:05 AM Prike Liang <Prike.Liang at amd.com> wrote:
>>>
>>> The userq VA unmap requires validating queue status before unamapping
>>> it, if user tries to unmap an active userq by GEM VA IOCTL then the
>>> driver should report an error for this illegal usage.
>>>
>>> Signed-off-by: Prike Liang <Prike.Liang at amd.com>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 16 +++++++++++++---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  5 +++++
>>>  2 files changed, 18 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>>> index e43a61f64755..e2275280554d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>>> @@ -234,7 +234,7 @@ amdgpu_userq_map_helper(struct amdgpu_userq_mgr
>> *uq_mgr,
>>>         return r;
>>>  }
>>>
>>> -static void
>>> +static int
>>>  amdgpu_userq_wait_for_last_fence(struct amdgpu_userq_mgr *uq_mgr,
>>>                                  struct amdgpu_usermode_queue *queue)
>>> { @@ -243,10 +243,14 @@ amdgpu_userq_wait_for_last_fence(struct
>>> amdgpu_userq_mgr *uq_mgr,
>>>
>>>         if (f && !dma_fence_is_signaled(f)) {
>>>                 ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
>>> -               if (ret <= 0)
>>> +               if (ret <= 0) {
>>>                         drm_file_err(uq_mgr->file, "Timed out waiting for
>> fence=%llu:%llu\n",
>>>                                      f->context, f->seqno);
>>> +                       return -ETIMEDOUT;
>>> +               }
>>>         }
>>> +
>>> +       return 0;
>>>  }
>>>
>>>  static void
>>> @@ -464,7 +468,13 @@ amdgpu_userq_destroy(struct drm_file *filp, int
>> queue_id)
>>>                 mutex_unlock(&uq_mgr->userq_mutex);
>>>                 return -EINVAL;
>>>         }
>>> -       amdgpu_userq_wait_for_last_fence(uq_mgr, queue);
>>> +
>>> +       if (amdgpu_userq_wait_for_last_fence(uq_mgr, queue)) {
>>> +               drm_warn(adev_to_drm(uq_mgr->adev), "Don't destroy a busy
>> userq\n");
>>> +               mutex_unlock(&uq_mgr->userq_mutex);
>>> +               return -EINVAL;
>>> +       }
>>> +
>>>         r = amdgpu_bo_reserve(queue->db_obj.obj, true);
>>>         if (!r) {
>>>                 amdgpu_bo_unpin(queue->db_obj.obj);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 5e075e8f0ca3..168fc3835aaf 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -1930,6 +1930,11 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device
>> *adev,
>>>         struct amdgpu_vm *vm = bo_va->base.vm;
>>>         bool valid = true;
>>>
>>> +       if (amdgpu_userq_gem_va_unmap_wait(vm)) {
>>> +               dev_warn(adev->dev, "shouldn't unmap the VA for an active userq\n");
>>> +               return -EINVAL;
>>> +       }
>>
>> There's not need to wait for anything.  Just return an error if the userq has not been
>> destroyed yet.
> [Prike] The current userq destroy IOCTL request always comes after amdgpu_vm_bo_unmap(),
> the original destroy validation idea is that the userq BOs only be able to be unmapped when they are
> idle, so here do we need to do the BOs busy wait or just test and throw a warning/error message when
> the driver tries unmap a busy BO?

Yeah that discussion came up previously before.

As far as I can see we can't return an error from unmap because the BO might still be used by a queue.

At least on the gfx side such a behavior could easily lead to crashes because userspace doesn't expect that an unmap fails and tries to re-use the address range.

What we can do is deactivate that queue and prevent it from becomming active again.

Regards,
Christian.

> 
>> Alex
>>
>>> +
>>>         saddr /= AMDGPU_GPU_PAGE_SIZE;
>>>
>>>         list_for_each_entry(mapping, &bo_va->valids, list) {
>>> --
>>> 2.34.1
>>>