[PATCH] drm/amdkfd: change kfd process kref count at creation

Felix Kuehling felix.kuehling at amd.com
Thu Oct 10 19:01:51 UTC 2024


On 2024-10-09 18:16, Chen, Xiaogang wrote:
>
> On 10/9/2024 4:45 PM, Felix Kuehling wrote:
>>
>> On 2024-10-09 17:02, Chen, Xiaogang wrote:
>>>
>>> On 10/9/2024 3:38 PM, Felix Kuehling wrote:
>>>> On 2024-10-09 14:08, Xiaogang.Chen wrote:
>>>>> From: Xiaogang Chen <xiaogang.chen at amd.com>
>>>>>
>>>>> kfd process kref count(process->ref) is initialized to 1 by 
>>>>> kref_init. After
>>>>> it is created not need to increaes its kref. Instad add kfd 
>>>>> process kref at kfd
>>>>> process mmu notifier allocation since we decrease the ref at 
>>>>> free_notifier of
>>>>> mmu_notifier_ops, so pair them.
>>>>
>>>> That's not correct. kfd_create_process returns a struct kfd_process 
>>>> * reference. That gets stored by the caller in filep->private_data. 
>>>> That requires incrementing the reference count. You can have 
>>>> multiple references to the same struct kfd_process if user mode 
>>>> opens /dev/kfd multiple times. The reference is released in 
>>>> kfd_release. Your change breaks that use case.
>>>>
>>> ok, if user mode open and close /dev/kfd multiple times(current 
>>> Thunk only allows user process open the kfd node once)  the change 
>>> will break this use case.
>>>> The reference released in kfd_process_free_notifier is only one per 
>>>> process and it's the reference created by kref_init.
>>>
>>> I think we can increase kref if find_process return true(the user 
>>> process already created kfd process). If find_process return false 
>>> do not increase kref since kref_init has been set to 1.
>>>
>>> Or change find_process(thread, false) to find_process(thread, true) 
>>> that will increase kref if it finds kfd process has been created.
>>>
>>> The idea is to pair kref update between alloc_notifier and 
>>> free_notifier of mmu_notifier_ops for same process(mm). That would 
>>> seem natural.
>>
>> What's the problem you're trying to solve? Is it just a cosmetic 
>> issue? The MMU notifier is registered in create_process (not 
>> kfd_create_process). If you add a kref_get in 
>> kfd_process_alloc_notifier you need to kfd_unref_process somewhere in 
>> create_process. I don't think it's worth the trouble and only risks 
>> introducing more reference counting bugs.
>
> It is for making code cleaner or natural to read. mmu_notifier_get is 
> the last call at create_process. If mmu_notifier_get fail the process 
> is freed: kfree(process).  If create_process success 
> kfd_create_process return that process anyway(after create_process 
> kfd_create_process creates sys entries that not affect return created 
> kfd process). The finally result is same that kref is 2: one for kfd 
> process creation, one for mmu notifier allocation.

Currently, when you call kfd_create_process for the first time, it 
returns with kref=2. One reference for the MMU notifier, and one for 
file->private_data.

Subsequent invocations of kfd_create_process when the process already 
exists should increment the kref by one to track the additional 
reference put into the new file->private_data.

If you can come up with a patch that preserves this logic _and makes the 
code simpler and more readable_, I will consider approving it. Also keep 
in mind that your patch would need to be ported to the DKMS branch, 
where there are two different code paths to support older kernels that 
don't have mmu_notifier_get/put.

Regards,
   Felix


>
> Regards
>
> Xiaogang
>
>> Regards,
>>   Felix
>>
>>
>>>
>>> Regards
>>>
>>> Xiaogang
>>>
>>>
>>>>
>>>> Regards,
>>>>   Felix
>>>>
>>>>
>>>>>
>>>>> Signed-off-by: Xiaogang Chen <Xiaogang.Chen at amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 8 +++++---
>>>>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>> index d07acf1b2f93..7c5471d7d743 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>> @@ -899,8 +899,6 @@ struct kfd_process *kfd_create_process(struct 
>>>>> task_struct *thread)
>>>>> init_waitqueue_head(&process->wait_irq_drain);
>>>>>       }
>>>>>   out:
>>>>> -    if (!IS_ERR(process))
>>>>> -        kref_get(&process->ref);
>>>>>       mutex_unlock(&kfd_processes_mutex);
>>>>>       mmput(thread->mm);
>>>>>   @@ -1191,7 +1189,11 @@ static struct mmu_notifier 
>>>>> *kfd_process_alloc_notifier(struct mm_struct *mm)
>>>>>         srcu_read_unlock(&kfd_processes_srcu, idx);
>>>>>   -    return p ? &p->mmu_notifier : ERR_PTR(-ESRCH);
>>>>> +    if (p) {
>>>>> +        kref_get(&p->ref);
>>>>> +        return &p->mmu_notifier;
>>>>> +    }
>>>>> +    return ERR_PTR(-ESRCH);
>>>>>   }
>>>>>     static void kfd_process_free_notifier(struct mmu_notifier *mn)


More information about the amd-gfx mailing list