[PATCH 1/1] drm/amdkfd: Remove queue sysfs and doorbell after unmapping
Felix Kuehling
felix.kuehling at amd.com
Tue Jul 5 18:48:20 UTC 2022
On 2022-06-10 21:03, Philip Yang wrote:
> If destroying/unmapping queue failed, application may destroy queue
> again, cause below kernel warning backtrace.
>
> For outstanding queues, either applications forget to destroy or failed
> to destroy, kfd_process_notifier_release will remove queue sysfs
> objects, kfd_process_wq_release will free queue doorbell.
>
> refcount_t: underflow; use-after-free.
> WARNING: CPU: 7 PID: 3053 at lib/refcount.c:28
> Call Trace:
> kobject_put+0xd6/0x1a0
> kfd_procfs_del_queue+0x27/0x30 [amdgpu]
> pqm_destroy_queue+0xeb/0x240 [amdgpu]
> kfd_ioctl_destroy_queue+0x32/0x70 [amdgpu]
> kfd_ioctl+0x27d/0x500 [amdgpu]
> do_syscall_64+0x35/0x80
>
> WARNING: CPU: 2 PID: 3053 at drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:400
> Call Trace:
> deallocate_doorbell.isra.0+0x39/0x40 [amdgpu]
> destroy_queue_cpsch+0xb3/0x270 [amdgpu]
> pqm_destroy_queue+0x108/0x240 [amdgpu]
> kfd_ioctl_destroy_queue+0x32/0x70 [amdgpu]
> kfd_ioctl+0x27d/0x500 [amdgpu]
>
> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 4 ++--
> drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 2 +-
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index e1797657b04c..1c519514ca1a 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1876,8 +1876,6 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm,
> mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
> q->properties.type)];
>
> - deallocate_doorbell(qpd, q);
> -
> if ((q->properties.type == KFD_QUEUE_TYPE_SDMA) ||
> (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)) {
> deallocate_sdma_queue(dqm, q);
> @@ -1898,6 +1896,8 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm,
> }
> }
>
> + deallocate_doorbell(qpd, q);
> +
I'm not sure what this change is meant to do. I don't see an early
return in this function, so deallocate_doorbell will always be executed
either way.
> /*
> * Unconditionally decrement this counter, regardless of the queue's
> * type
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index dc00484ff484..99f2a6412201 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -419,7 +419,6 @@ int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid)
> }
>
> if (pqn->q) {
> - kfd_procfs_del_queue(pqn->q);
> dqm = pqn->q->device->dqm;
> retval = dqm->ops.destroy_queue(dqm, &pdd->qpd, pqn->q);
> if (retval) {
> @@ -439,6 +438,7 @@ int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid)
> if (dev->shared_resources.enable_mes)
> amdgpu_amdkfd_free_gtt_mem(dev->adev,
> pqn->q->gang_ctx_bo);
> + kfd_procfs_del_queue(pqn->q);
This part makes sense.
Regards,
Felix
> uninit_queue(pqn->q);
> }
>
More information about the amd-gfx
mailing list