[PATCH 1/1] drm/amdkfd: Remove queue sysfs and doorbell after unmapping

Felix Kuehling felix.kuehling at amd.com
Tue Jul 5 18:48:20 UTC 2022


On 2022-06-10 21:03, Philip Yang wrote:
> If destroying/unmapping queue failed, application may destroy queue
> again, cause below kernel warning backtrace.
>
> For outstanding queues, either applications forget to destroy or failed
> to destroy, kfd_process_notifier_release will remove queue sysfs
> objects, kfd_process_wq_release will free queue doorbell.
>
>   refcount_t: underflow; use-after-free.
>   WARNING: CPU: 7 PID: 3053 at lib/refcount.c:28
>    Call Trace:
>     kobject_put+0xd6/0x1a0
>     kfd_procfs_del_queue+0x27/0x30 [amdgpu]
>     pqm_destroy_queue+0xeb/0x240 [amdgpu]
>     kfd_ioctl_destroy_queue+0x32/0x70 [amdgpu]
>     kfd_ioctl+0x27d/0x500 [amdgpu]
>     do_syscall_64+0x35/0x80
>
>   WARNING: CPU: 2 PID: 3053 at drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:400
>    Call Trace:
>     deallocate_doorbell.isra.0+0x39/0x40 [amdgpu]
>     destroy_queue_cpsch+0xb3/0x270 [amdgpu]
>     pqm_destroy_queue+0x108/0x240 [amdgpu]
>     kfd_ioctl_destroy_queue+0x32/0x70 [amdgpu]
>     kfd_ioctl+0x27d/0x500 [amdgpu]
>
> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 4 ++--
>   drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 2 +-
>   2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index e1797657b04c..1c519514ca1a 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1876,8 +1876,6 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm,
>   	mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
>   			q->properties.type)];
>   
> -	deallocate_doorbell(qpd, q);
> -
>   	if ((q->properties.type == KFD_QUEUE_TYPE_SDMA) ||
>   	    (q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)) {
>   		deallocate_sdma_queue(dqm, q);
> @@ -1898,6 +1896,8 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm,
>   		}
>   	}
>   
> +	deallocate_doorbell(qpd, q);
> +

I'm not sure what this change is meant to do. I don't see an early 
return in this function, so deallocate_doorbell will always be executed 
either way.


>   	/*
>   	 * Unconditionally decrement this counter, regardless of the queue's
>   	 * type
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index dc00484ff484..99f2a6412201 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -419,7 +419,6 @@ int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid)
>   	}
>   
>   	if (pqn->q) {
> -		kfd_procfs_del_queue(pqn->q);
>   		dqm = pqn->q->device->dqm;
>   		retval = dqm->ops.destroy_queue(dqm, &pdd->qpd, pqn->q);
>   		if (retval) {
> @@ -439,6 +438,7 @@ int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid)
>   		if (dev->shared_resources.enable_mes)
>   			amdgpu_amdkfd_free_gtt_mem(dev->adev,
>   						   pqn->q->gang_ctx_bo);
> +		kfd_procfs_del_queue(pqn->q);

This part makes sense.

Regards,
   Felix


>   		uninit_queue(pqn->q);
>   	}
>   


More information about the amd-gfx mailing list