[PATCH v2] drm/amdkfd: change kfd process kref count at creation

Felix Kuehling felix.kuehling at amd.com
Fri Oct 18 19:14:15 UTC 2024


On 2024-10-11 10:41, Xiaogang.Chen wrote:
> From: Xiaogang Chen <xiaogang.chen at amd.com>
>
> kfd process kref count(process->ref) is initialized to 1 by kref_init. After
> it is created not need to increaes its kref. Instad add kfd process kref at kfd
> process mmu notifier allocation since we decrease the ref at free_notifier of
> mmu_notifier_ops, so pair them.
>
> When user process opens kfd node multiple times the kfd process kref is
> increased each time to balance kfd node close operation.
>
> Signed-off-by: Xiaogang Chen <Xiaogang.Chen at amd.com>

Thanks, this is an elegant solution, IMO. The reference returned by 
kfd_create_process comes either from find_process or create_process. And 
the extra reference that gets released by the free_notifier gets 
allocated by the alloc_notifier. I think there is a race condition, 
though. See inline.


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 ++++++++++-----
>   1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index d07acf1b2f93..78bf918abf92 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -850,8 +850,10 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>   		goto out;
>   	}
>   
> -	/* A prior open of /dev/kfd could have already created the process. */
> -	process = find_process(thread, false);
> +	/* A prior open of /dev/kfd could have already created the process.
> +	 * find_process will increase process kref in this case
> +	 */
> +	process = find_process(thread, true);
>   	if (process) {
>   		pr_debug("Process already found\n");
>   	} else {
> @@ -899,8 +901,6 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>   		init_waitqueue_head(&process->wait_irq_drain);
>   	}
>   out:
> -	if (!IS_ERR(process))
> -		kref_get(&process->ref);
>   	mutex_unlock(&kfd_processes_mutex);
>   	mmput(thread->mm);
>   
> @@ -1191,7 +1191,12 @@ static struct mmu_notifier *kfd_process_alloc_notifier(struct mm_struct *mm)
>   
>   	srcu_read_unlock(&kfd_processes_srcu, idx);
>   
> -	return p ? &p->mmu_notifier : ERR_PTR(-ESRCH);
> +	if (p) {
> +		kref_get(&p->ref);

This should be inside the srcu. I think you could use 
kfd_lookup_process_by_mm instead of open-coding the SRCU locking and 
find_process_by_mm. This does the lookup and reference counting safely 
already.

Regards,
   Felix

> +		return &p->mmu_notifier;
> +	}
> +
> +	return ERR_PTR(-ESRCH);
>   }
>   
>   static void kfd_process_free_notifier(struct mmu_notifier *mn)


More information about the amd-gfx mailing list