[PATCH 3/3] drm/amdkfd: destroy_pdds release pdd->drm_file at end
Chen, Xiaogang
xiaogang.chen at amd.com
Fri May 16 15:19:23 UTC 2025
On 5/16/2025 8:07 AM, Philip Yang wrote:
>
> On 2025-05-15 17:31, Chen, Xiaogang wrote:
>>
>> On 5/15/2025 3:45 PM, Philip Yang wrote:
>>>
>>> On 2025-05-15 10:29, Chen, Xiaogang wrote:
>>>>
>>>> Does this patch fix a bug or just make code look more reasonable?
>>>> kfd_process_destroy_pdds releases pdd related buffers, not related
>>>> to operations on vm. So vm tear down dose not affect this function.
>>>>
>>> This change doesn't fix anything currently, as fput(pdd->drm_file)
>>> to free vm is right between free vm mapping qpd->cwsr_mem,
>>> qpd->ib_mem and free kernel bo qpd->proc_doorbells,
>>> pdd->proc_ctx_bo, to make it clear for future change.
>>
>> Then the current place to do fput(pdd->drm_file) make more sense:
>> unmap vm mapping of qpd->cwsr_mem, qpd->ib_mem is the last place
>> where kfd process release procedure needs vm alive. After that the
>> kfd process release does not need vm alive. It then releases
>> remaining buffers. So release drm_file as soon as we do not need hold
>> it.
>
> The issue was vm_fini shows error message "still active bo inside vm"
> (1/1000) chance, took a while to trace down the leaking vm mapping,
> the issue is seq64 memory mapping leaking and fixed by the first
> patch. KFD pdd cleanup path, free vm is in the middle of free pdd
> memory, this is one of the suspicious vm_fini race. We may add new pdd
> memory mapping to vm in future, to prevent the potential vm_fini race,
> this patch move free vm to after all pdd memory is freed and add comment.
I see the reason of [PATCH 1/3]. This patch is delay kfd's pdd drm_file
release a bit. kfd should release drm_file as soon as it does not need
vm. The issue you saw is there is buffer mapping still alive when driver
decides to tear down vm. Is the mapping from from kfd process? if not,
change timing somehow at kfd process release is not right place.
Regards
Xiaogang
>
> Regards,
>
> Philip
>
>>
>> Regards
>>
>> Xiaogang
>>
>>> Regards,
>>>
>>> Philip
>>>
>>>> Regards
>>>>
>>>> Xiaogang
>>>>
>>>> On 5/14/2025 12:10 PM, Philip Yang wrote:
>>>>> Release pdd->drm_file may free the vm if this is the last reference,
>>>>> move it to the last step after memory is unmapped.
>>>>>
>>>>> Signed-off-by: Philip Yang<Philip.Yang at amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 10 +++++++---
>>>>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>> index e868cc8da46f..b009c852180d 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>>> @@ -1063,9 +1063,6 @@ static void kfd_process_destroy_pdds(struct
>>>>> kfd_process *p)
>>>>> kfd_process_device_destroy_cwsr_dgpu(pdd);
>>>>> kfd_process_device_destroy_ib_mem(pdd);
>>>>> - if (pdd->drm_file)
>>>>> - fput(pdd->drm_file);
>>>>> -
>>>>> if (pdd->qpd.cwsr_kaddr && !pdd->qpd.cwsr_base)
>>>>> free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
>>>>> get_order(KFD_CWSR_TBA_TMA_SIZE));
>>>>> @@ -1088,6 +1085,13 @@ static void kfd_process_destroy_pdds(struct
>>>>> kfd_process *p)
>>>>> pdd->runtime_inuse = false;
>>>>> }
>>>>> + /*
>>>>> + * This may release the vm if application already close
>>>>> the drm node,
>>>>> + * do it as last step after memory unmapped.
>>>>> + */
>>>>> + if (pdd->drm_file)
>>>>> + fput(pdd->drm_file);
>>>>> +
>>>>> kfree(pdd);
>>>>> p->pdds[i] = NULL;
>>>>> }
More information about the amd-gfx
mailing list