[PATCH] drm/amdkfd: change kfd process kref count at creation
Chen, Xiaogang
xiaogang.chen at amd.com
Thu Oct 10 20:19:04 UTC 2024
On 10/10/2024 2:01 PM, Felix Kuehling wrote:
>
> On 2024-10-09 18:16, Chen, Xiaogang wrote:
>>
>> On 10/9/2024 4:45 PM, Felix Kuehling wrote:
>>>
>>> On 2024-10-09 17:02, Chen, Xiaogang wrote:
>>>>
>>>> On 10/9/2024 3:38 PM, Felix Kuehling wrote:
>>>>> On 2024-10-09 14:08, Xiaogang.Chen wrote:
>>>>>> From: Xiaogang Chen <xiaogang.chen at amd.com>
>>>>>>
>>>>>> kfd process kref count(process->ref) is initialized to 1 by
>>>>>> kref_init. After
>>>>>> it is created not need to increaes its kref. Instad add kfd
>>>>>> process kref at kfd
>>>>>> process mmu notifier allocation since we decrease the ref at
>>>>>> free_notifier of
>>>>>> mmu_notifier_ops, so pair them.
>>>>>
>>>>> That's not correct. kfd_create_process returns a struct
>>>>> kfd_process * reference. That gets stored by the caller in
>>>>> filep->private_data. That requires incrementing the reference
>>>>> count. You can have multiple references to the same struct
>>>>> kfd_process if user mode opens /dev/kfd multiple times. The
>>>>> reference is released in kfd_release. Your change breaks that use
>>>>> case.
>>>>>
>>>> ok, if user mode open and close /dev/kfd multiple times(current
>>>> Thunk only allows user process open the kfd node once) the change
>>>> will break this use case.
>>>>> The reference released in kfd_process_free_notifier is only one
>>>>> per process and it's the reference created by kref_init.
>>>>
>>>> I think we can increase kref if find_process return true(the user
>>>> process already created kfd process). If find_process return false
>>>> do not increase kref since kref_init has been set to 1.
>>>>
>>>> Or change find_process(thread, false) to find_process(thread, true)
>>>> that will increase kref if it finds kfd process has been created.
>>>>
>>>> The idea is to pair kref update between alloc_notifier and
>>>> free_notifier of mmu_notifier_ops for same process(mm). That would
>>>> seem natural.
>>>
>>> What's the problem you're trying to solve? Is it just a cosmetic
>>> issue? The MMU notifier is registered in create_process (not
>>> kfd_create_process). If you add a kref_get in
>>> kfd_process_alloc_notifier you need to kfd_unref_process somewhere
>>> in create_process. I don't think it's worth the trouble and only
>>> risks introducing more reference counting bugs.
>>
>> It is for making code cleaner or natural to read. mmu_notifier_get is
>> the last call at create_process. If mmu_notifier_get fail the process
>> is freed: kfree(process). If create_process success
>> kfd_create_process return that process anyway(after create_process
>> kfd_create_process creates sys entries that not affect return created
>> kfd process). The finally result is same that kref is 2: one for kfd
>> process creation, one for mmu notifier allocation.
>
> Currently, when you call kfd_create_process for the first time, it
> returns with kref=2. One reference for the MMU notifier, and one for
> file->private_data.
>
> Subsequent invocations of kfd_create_process when the process already
> exists should increment the kref by one to track the additional
> reference put into the new file->private_data.
one ways is changing find_process(thread, false) to find_process(thread,
true) at kfd_create_process. When kfd process already exist find_process
will call kref_get(&p->ref);
>
>
> If you can come up with a patch that preserves this logic _and makes
> the code simpler and more readable_, I will consider approving it.
> Also keep in mind that your patch would need to be ported to the DKMS
> branch, where there are two different code paths to support older
> kernels that don't have mmu_notifier_get/put.
>
At DKMS branch alloc_notifier and free_notifer either exist together or
both not exist. So when HAVE_MMU_NOTIFIER_PUT is defined(new kernel) it
is ok.
#ifdef HAVE_MMU_NOTIFIER_PUT
.alloc_notifier = kfd_process_alloc_notifier,
.free_notifier = kfd_process_free_notifier,
#endif
but when HAVE_MMU_NOTIFIER_PUT is not defined we need change
kfd_process_destroy_delayed since since it call kfd_unref_process(p);
static void kfd_process_destroy_delayed(struct rcu_head *rcu)
{
struct kfd_process *p = container_of(rcu, struct kfd_process, rcu);
kfd_unref_process(p);
}
That means if port this patch to dkms branch when HAVE_MMU_NOTIFIER_PUT
is not defined(old kernel) we do not need call
kfd_process_destroy_delayed or remove mmu_notifier_call_srcu(&p->rcu,
&kfd_process_destroy_delayed) at kfd_process_notifier_release_internal.
I think that make thing simpler for old kernel.
So it needs additional handling for old kernel on dkms branch. I do not
know who port patch to dkms branch, or I should change that on dkms branch.
Regards
Xiaogang
> Regards,
> Felix
>
>
>>
>> Regards
>>
>> Xiaogang
>>
>>> Regards,
>>> Felix
>>>
>>>
>>>>
>>>> Regards
>>>>
>>>> Xiaogang
>>>>
>>>>
>>>>>
>>>>> Regards,
>>>>> Felix
>>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Xiaogang Chen <Xiaogang.Chen at amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 8 +++++---
>>>>>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> index d07acf1b2f93..7c5471d7d743 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>>> @@ -899,8 +899,6 @@ struct kfd_process *kfd_create_process(struct
>>>>>> task_struct *thread)
>>>>>> init_waitqueue_head(&process->wait_irq_drain);
>>>>>> }
>>>>>> out:
>>>>>> - if (!IS_ERR(process))
>>>>>> - kref_get(&process->ref);
>>>>>> mutex_unlock(&kfd_processes_mutex);
>>>>>> mmput(thread->mm);
>>>>>> @@ -1191,7 +1189,11 @@ static struct mmu_notifier
>>>>>> *kfd_process_alloc_notifier(struct mm_struct *mm)
>>>>>> srcu_read_unlock(&kfd_processes_srcu, idx);
>>>>>> - return p ? &p->mmu_notifier : ERR_PTR(-ESRCH);
>>>>>> + if (p) {
>>>>>> + kref_get(&p->ref);
>>>>>> + return &p->mmu_notifier;
>>>>>> + }
>>>>>> + return ERR_PTR(-ESRCH);
>>>>>> }
>>>>>> static void kfd_process_free_notifier(struct mmu_notifier *mn)
More information about the amd-gfx
mailing list