[PATCH] drm/amdkfd: change kfd process kref count at creation

Felix Kuehling felix.kuehling at amd.com
Thu Oct 10 20:37:09 UTC 2024


On 2024-10-10 16:19, Chen, Xiaogang wrote:
>
> On 10/10/2024 2:01 PM, Felix Kuehling wrote:
>>
>> On 2024-10-09 18:16, Chen, Xiaogang wrote:
>>>
>>> On 10/9/2024 4:45 PM, Felix Kuehling wrote:
>>>>
>>>> On 2024-10-09 17:02, Chen, Xiaogang wrote:
>>>>>
>>>>> On 10/9/2024 3:38 PM, Felix Kuehling wrote:
>>>>>> On 2024-10-09 14:08, Xiaogang.Chen wrote:
>>>>>>> From: Xiaogang Chen <xiaogang.chen at amd.com>
>>>>>>>
>>>>>>> kfd process kref count(process->ref) is initialized to 1 by 
>>>>>>> kref_init. After
>>>>>>> it is created not need to increaes its kref. Instad add kfd 
>>>>>>> process kref at kfd
>>>>>>> process mmu notifier allocation since we decrease the ref at 
>>>>>>> free_notifier of
>>>>>>> mmu_notifier_ops, so pair them.
>>>>>>
>>>>>> That's not correct. kfd_create_process returns a struct 
>>>>>> kfd_process * reference. That gets stored by the caller in 
>>>>>> filep->private_data. That requires incrementing the reference 
>>>>>> count. You can have multiple references to the same struct 
>>>>>> kfd_process if user mode opens /dev/kfd multiple times. The 
>>>>>> reference is released in kfd_release. Your change breaks that use 
>>>>>> case.
>>>>>>
>>>>> ok, if user mode open and close /dev/kfd multiple times(current 
>>>>> Thunk only allows user process open the kfd node once)  the change 
>>>>> will break this use case.
>>>>>> The reference released in kfd_process_free_notifier is only one 
>>>>>> per process and it's the reference created by kref_init.
>>>>>
>>>>> I think we can increase kref if find_process return true(the user 
>>>>> process already created kfd process). If find_process return false 
>>>>> do not increase kref since kref_init has been set to 1.
>>>>>
>>>>> Or change find_process(thread, false) to find_process(thread, 
>>>>> true) that will increase kref if it finds kfd process has been 
>>>>> created.
>>>>>
>>>>> The idea is to pair kref update between alloc_notifier and 
>>>>> free_notifier of mmu_notifier_ops for same process(mm). That would 
>>>>> seem natural.
>>>>
>>>> What's the problem you're trying to solve? Is it just a cosmetic 
>>>> issue? The MMU notifier is registered in create_process (not 
>>>> kfd_create_process). If you add a kref_get in 
>>>> kfd_process_alloc_notifier you need to kfd_unref_process somewhere 
>>>> in create_process. I don't think it's worth the trouble and only 
>>>> risks introducing more reference counting bugs.
>>>
>>> It is for making code cleaner or natural to read. mmu_notifier_get 
>>> is the last call at create_process. If mmu_notifier_get fail the 
>>> process is freed: kfree(process). If create_process success 
>>> kfd_create_process return that process anyway(after create_process 
>>> kfd_create_process creates sys entries that not affect return 
>>> created kfd process). The finally result is same that kref is 2: one 
>>> for kfd process creation, one for mmu notifier allocation.
>>
>> Currently, when you call kfd_create_process for the first time, it 
>> returns with kref=2. One reference for the MMU notifier, and one for 
>> file->private_data.
>>
>> Subsequent invocations of kfd_create_process when the process already 
>> exists should increment the kref by one to track the additional 
>> reference put into the new file->private_data.
> one ways is changing find_process(thread, false) to 
> find_process(thread, true) at kfd_create_process. When kfd process 
> already exist find_process will call kref_get(&p->ref);
>>
>>
>> If you can come up with a patch that preserves this logic _and makes 
>> the code simpler and more readable_, I will consider approving it. 
>> Also keep in mind that your patch would need to be ported to the DKMS 
>> branch, where there are two different code paths to support older 
>> kernels that don't have mmu_notifier_get/put.
>>
> At DKMS branch alloc_notifier and free_notifer either exist together 
> or both not exist. So when HAVE_MMU_NOTIFIER_PUT is defined(new 
> kernel) it is ok.
>
> #ifdef HAVE_MMU_NOTIFIER_PUT
>         .alloc_notifier = kfd_process_alloc_notifier,
>         .free_notifier = kfd_process_free_notifier,
> #endif
>
> but when HAVE_MMU_NOTIFIER_PUT is not defined we need change 
> kfd_process_destroy_delayed since since it call kfd_unref_process(p);
>
> static void kfd_process_destroy_delayed(struct rcu_head *rcu)
> {
>         struct kfd_process *p = container_of(rcu, struct kfd_process, 
> rcu);
>
>         kfd_unref_process(p);
> }
>
> That means if port this patch to dkms branch when 
> HAVE_MMU_NOTIFIER_PUT is not defined(old kernel) we do not need call 
> kfd_process_destroy_delayed or remove mmu_notifier_call_srcu(&p->rcu, 
> &kfd_process_destroy_delayed)  at 
> kfd_process_notifier_release_internal. I think that make thing simpler 
> for old kernel.

No, we still need to destroy the kref that belongs to the process when 
the mm_struct is destroyed. We can't do that in 
kfd_process_notifier_release_internal because it leads to LOCKDEP 
issues. So we still need kfd_process_destroy_delayed.

Regards,
   Felix


>
> So it needs additional handling for old kernel on dkms branch. I do 
> not know who port patch to dkms branch, or I should change that on 
> dkms branch.
>
> Regards
>
> Xiaogang
>
>
>> Regards,
>>   Felix
>>
>>
>>>
>>> Regards
>>>
>>> Xiaogang
>>>
>>>> Regards,
>>>>   Felix
>>>>
>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>> Xiaogang
>>>>>
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>   Felix
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Xiaogang Chen <Xiaogang.Chen at amd.com>
>>>>>>> ---
>>>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 8 +++++---
>>>>>>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
>>>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>>> index d07acf1b2f93..7c5471d7d743 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>>> @@ -899,8 +899,6 @@ struct kfd_process 
>>>>>>> *kfd_create_process(struct task_struct *thread)
>>>>>>> init_waitqueue_head(&process->wait_irq_drain);
>>>>>>>       }
>>>>>>>   out:
>>>>>>> -    if (!IS_ERR(process))
>>>>>>> -        kref_get(&process->ref);
>>>>>>>       mutex_unlock(&kfd_processes_mutex);
>>>>>>>       mmput(thread->mm);
>>>>>>>   @@ -1191,7 +1189,11 @@ static struct mmu_notifier 
>>>>>>> *kfd_process_alloc_notifier(struct mm_struct *mm)
>>>>>>>         srcu_read_unlock(&kfd_processes_srcu, idx);
>>>>>>>   -    return p ? &p->mmu_notifier : ERR_PTR(-ESRCH);
>>>>>>> +    if (p) {
>>>>>>> +        kref_get(&p->ref);
>>>>>>> +        return &p->mmu_notifier;
>>>>>>> +    }
>>>>>>> +    return ERR_PTR(-ESRCH);
>>>>>>>   }
>>>>>>>     static void kfd_process_free_notifier(struct mmu_notifier *mn)


More information about the amd-gfx mailing list