[PATCH 3/3] drm/amdkfd: destroy_pdds release pdd->drm_file at end

Philip Yang yangp at amd.com
Fri May 16 13:07:24 UTC 2025


On 2025-05-15 17:31, Chen, Xiaogang wrote:
>
> On 5/15/2025 3:45 PM, Philip Yang wrote:
>>
>> On 2025-05-15 10:29, Chen, Xiaogang wrote:
>>>
>>> Does this patch fix a bug or just make code look more reasonable? 
>>> kfd_process_destroy_pdds releases pdd related buffers, not related 
>>> to operations on vm. So vm tear down dose not affect this function.
>>>
>> This change doesn't fix anything currently, as fput(pdd->drm_file) to 
>> free vm is right between free vm mapping qpd->cwsr_mem, qpd->ib_mem 
>> and free kernel bo qpd->proc_doorbells, pdd->proc_ctx_bo, to make it 
>> clear for future change.
>
> Then the current place to do fput(pdd->drm_file) make more sense: 
> unmap vm mapping of qpd->cwsr_mem, qpd->ib_mem is the last place where 
> kfd process release procedure needs vm alive. After that the kfd 
> process release does not need vm alive. It then releases remaining 
> buffers. So release drm_file as soon as we do not need hold it.

The issue was  vm_fini shows error message "still active bo inside vm" 
(1/1000) chance, took a while to trace down the leaking vm mapping, the 
issue is seq64 memory mapping leaking and fixed by the first patch. KFD 
pdd cleanup path, free vm is in the middle of free pdd memory, this is 
one of the suspicious vm_fini race. We may add new pdd memory mapping to 
vm in future, to prevent the potential vm_fini race, this patch move 
free vm to after all pdd memory is freed and add comment.

Regards,

Philip

>
> Regards
>
> Xiaogang
>
>> Regards,
>>
>> Philip
>>
>>> Regards
>>>
>>> Xiaogang
>>>
>>> On 5/14/2025 12:10 PM, Philip Yang wrote:
>>>> Release pdd->drm_file may free the vm if this is the last reference,
>>>> move it to the last step after memory is unmapped.
>>>>
>>>> Signed-off-by: Philip Yang<Philip.Yang at amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 10 +++++++---
>>>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> index e868cc8da46f..b009c852180d 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> @@ -1063,9 +1063,6 @@ static void kfd_process_destroy_pdds(struct 
>>>> kfd_process *p)
>>>>           kfd_process_device_destroy_cwsr_dgpu(pdd);
>>>>           kfd_process_device_destroy_ib_mem(pdd);
>>>>   -        if (pdd->drm_file)
>>>> -            fput(pdd->drm_file);
>>>> -
>>>>           if (pdd->qpd.cwsr_kaddr && !pdd->qpd.cwsr_base)
>>>>               free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
>>>>                   get_order(KFD_CWSR_TBA_TMA_SIZE));
>>>> @@ -1088,6 +1085,13 @@ static void kfd_process_destroy_pdds(struct 
>>>> kfd_process *p)
>>>>               pdd->runtime_inuse = false;
>>>>           }
>>>>   +        /*
>>>> +         * This may release the vm if application already close 
>>>> the drm node,
>>>> +         * do it as last step after memory unmapped.
>>>> +         */
>>>> +        if (pdd->drm_file)
>>>> +            fput(pdd->drm_file);
>>>> +
>>>>           kfree(pdd);
>>>>           p->pdds[i] = NULL;
>>>>       }


More information about the amd-gfx mailing list