[PATCH 3/3] drm/amdkfd: destroy_pdds release pdd->drm_file at end

Chen, Xiaogang xiaogang.chen at amd.com
Fri May 16 15:19:23 UTC 2025


On 5/16/2025 8:07 AM, Philip Yang wrote:
>
> On 2025-05-15 17:31, Chen, Xiaogang wrote:
>>
>> On 5/15/2025 3:45 PM, Philip Yang wrote:
>>>
>>> On 2025-05-15 10:29, Chen, Xiaogang wrote:
>>>>
>>>> Does this patch fix a bug or just make code look more reasonable? 
>>>> kfd_process_destroy_pdds releases pdd related buffers, not related 
>>>> to operations on vm. So vm tear down dose not affect this function.
>>>>
>>> This change doesn't fix anything currently, as fput(pdd->drm_file) 
>>> to free vm is right between free vm mapping qpd->cwsr_mem, 
>>> qpd->ib_mem and free kernel bo qpd->proc_doorbells, 
>>> pdd->proc_ctx_bo, to make it clear for future change.
>>
>> Then the current place to do fput(pdd->drm_file) make more sense: 
>> unmap vm mapping of qpd->cwsr_mem, qpd->ib_mem is the last place 
>> where kfd process release procedure needs vm alive. After that the 
>> kfd process release does not need vm alive. It then releases 
>> remaining buffers. So release drm_file as soon as we do not need hold 
>> it.
>
> The issue was  vm_fini shows error message "still active bo inside vm" 
> (1/1000) chance, took a while to trace down the leaking vm mapping, 
> the issue is seq64 memory mapping leaking and fixed by the first 
> patch. KFD pdd cleanup path, free vm is in the middle of free pdd 
> memory, this is one of the suspicious vm_fini race. We may add new pdd 
> memory mapping to vm in future, to prevent the potential vm_fini race, 
> this patch move free vm to after all pdd memory is freed and add comment.

I see the reason of [PATCH 1/3]. This patch is delay kfd's pdd drm_file 
release a bit. kfd should release drm_file as soon as it does not need 
vm. The issue you saw is there is buffer mapping still alive when driver 
decides to tear down vm.  Is the mapping from from kfd process? if not, 
change timing somehow at kfd process release is not right place.

Regards

Xiaogang

>
> Regards,
>
> Philip
>
>>
>> Regards
>>
>> Xiaogang
>>
>>> Regards,
>>>
>>> Philip
>>>
>>>> Regards
>>>>
>>>> Xiaogang
>>>>
>>>> On 5/14/2025 12:10 PM, Philip Yang wrote:
>>>>> Release pdd->drm_file may free the vm if this is the last reference,
>>>>> move it to the last step after memory is unmapped.
>>>>>
>>>>> Signed-off-by: Philip Yang<Philip.Yang at amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 10 +++++++---
>>>>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>> index e868cc8da46f..b009c852180d 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>> @@ -1063,9 +1063,6 @@ static void kfd_process_destroy_pdds(struct 
>>>>> kfd_process *p)
>>>>>           kfd_process_device_destroy_cwsr_dgpu(pdd);
>>>>>           kfd_process_device_destroy_ib_mem(pdd);
>>>>>   -        if (pdd->drm_file)
>>>>> -            fput(pdd->drm_file);
>>>>> -
>>>>>           if (pdd->qpd.cwsr_kaddr && !pdd->qpd.cwsr_base)
>>>>>               free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
>>>>>                   get_order(KFD_CWSR_TBA_TMA_SIZE));
>>>>> @@ -1088,6 +1085,13 @@ static void kfd_process_destroy_pdds(struct 
>>>>> kfd_process *p)
>>>>>               pdd->runtime_inuse = false;
>>>>>           }
>>>>>   +        /*
>>>>> +         * This may release the vm if application already close 
>>>>> the drm node,
>>>>> +         * do it as last step after memory unmapped.
>>>>> +         */
>>>>> +        if (pdd->drm_file)
>>>>> +            fput(pdd->drm_file);
>>>>> +
>>>>>           kfree(pdd);
>>>>>           p->pdds[i] = NULL;
>>>>>       }


More information about the amd-gfx mailing list