[PATCH v2] drm/amdkfd: change kfd process kref count at creation

Chen, Xiaogang xiaogang.chen at amd.com
Mon Oct 14 15:07:42 UTC 2024


On 10/13/2024 8:55 PM, Zhu Lingshan wrote:
> On 10/13/2024 1:30 AM, Chen, Xiaogang wrote:
>> On 10/11/2024 9:56 PM, Zhu Lingshan wrote:
>>> On 10/11/2024 10:41 PM, Xiaogang.Chen wrote:
>>>> From: Xiaogang Chen <xiaogang.chen at amd.com>
>>>>
>>>> kfd process kref count(process->ref) is initialized to 1 by kref_init. After
>>>> it is created not need to increaes its kref. Instad add kfd process kref at kfd
>>>> process mmu notifier allocation since we decrease the ref at free_notifier of
>>>> mmu_notifier_ops, so pair them.
>>>>
>>>> When user process opens kfd node multiple times the kfd process kref is
>>>> increased each time to balance kfd node close operation.
>>>>
>>>> Signed-off-by: Xiaogang Chen <Xiaogang.Chen at amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdkfd/kfd_process.c | 15 ++++++++++-----
>>>>    1 file changed, 10 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> index d07acf1b2f93..78bf918abf92 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> @@ -850,8 +850,10 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>>>>            goto out;
>>>>        }
>>>>    -    /* A prior open of /dev/kfd could have already created the process. */
>>>> -    process = find_process(thread, false);
>>>> +    /* A prior open of /dev/kfd could have already created the process.
>>>> +     * find_process will increase process kref in this case
>>>> +     */
>>>> +    process = find_process(thread, true);
>>>>        if (process) {
>>>>            pr_debug("Process already found\n");
>>>>        } else {
>>>> @@ -899,8 +901,6 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
>>>>            init_waitqueue_head(&process->wait_irq_drain);
>>>>        }
>>>>    out:
>>>> -    if (!IS_ERR(process))
>>>> -        kref_get(&process->ref);
>>>>        mutex_unlock(&kfd_processes_mutex);
>>>>        mmput(thread->mm);
>>>>    @@ -1191,7 +1191,12 @@ static struct mmu_notifier *kfd_process_alloc_notifier(struct mm_struct *mm)
>>>>          srcu_read_unlock(&kfd_processes_srcu, idx);
>>>>    -    return p ? &p->mmu_notifier : ERR_PTR(-ESRCH);
>>>> +    if (p) {
>>>> +        kref_get(&p->ref);
>>>> +        return &p->mmu_notifier;
>>>> +    }
>>>> +
>>>> +    return ERR_PTR(-ESRCH);
>>> this cb should only allocate the notifier (here it returns an existing notifier ),
>>> so I am not sure this is a better place to increase the kref, it seems coupling
>>> two low correlated routines.
>>>
>>> kref is decreased in the free notifier, but not mean it has to be increased in alloc notifier.
>> Who referring kfd process should also un-referrer it after finish. Any client should not do un-refer if it did not refer. That keeps balance in clean way.
> I think we already do so, see any functions call kfd_lookup_process_by_xxx would unref the kref of the kfd_process.
>> The current way is using  mmu's free notifier to unref kfref that was added by kfd process creation. Ex: if not use mmu notifier there would be extra kref that prevent release kfd process.
> I am not sure this is about paring, current design is to free the last kref when the whole program exits by the mmu free notifier, so it would destroy the kfd_process.
> MMU free notifier would be certainly invoked since it has been registered.

This patch is about having "get/put" at correct places, or keeping kref 
balance in a clean way. We have 'put' kferf at mmu free notifier why not 
have 'get' kfref at mmu registry(alloc) notifier?

Regards

Xiaogang

>
> Thanks
> Lingshan
>> The final kref is same. The patch just makes the balance in a logical way.
>>
>> Regards
>>
>> Xiaogang
>>
>>> Thanks
>>> Lingshan
>>>
>>>>      static void kfd_process_free_notifier(struct mmu_notifier *mn)


More information about the amd-gfx mailing list