[PATCH v2] drm/amdkfd: Not restore userptr buffer if kfd process has been removed
Felix Kuehling
felix.kuehling at amd.com
Thu Oct 10 18:54:35 UTC 2024
On 2024-10-09 18:50, Chen, Xiaogang wrote:
>
> On 10/9/2024 4:36 PM, Felix Kuehling wrote:
>>
>> On 2024-10-04 11:54, Xiaogang.Chen wrote:
>>> From: Xiaogang Chen <xiaogang.chen at amd.com>
>>>
>>> When kfd process has been terminated not restore userptr buffer
>>> after mmu
>>> notifier invalidates a range.
>>
>> Is this fixing a real problem or a hypothetical problem? If there is
>> a real problem, can you include more information here? It looks to me
>> that amdgpu_amdkfd_restore_userptr_worker is already handling the
>> cases where a process or mm_struct no longer exists. Maybe the only
>> user visible change from this patch is, that you no longer print
>> "Failed to quiesce KFD" in a corner case of an MMU notifier for a
>> process that no longer exists?
>>
>> Or is there a problem with the lifetime of the process_info that
>> contains the work_struct?
>>
> My thought was that restore userptr buffer is not needed anyway if the
> correspondent kfd process has been removed. I noticed that during
> another work. I do not see if still restoring userptr buffer would
> cause issue when the kfd process not exist. Think avoid doing it is
> safer, and not see there is regression.
Well, safer is usually not to change things that aren't broken. Anyway,
this seems reasonable enough. With the braces below removed, the patch is
Reviewed-by: Felix Kuehling <felix.kuehling at amd.com>
>
> Regards
>
> Xiaogang
>
>>
>>>
>>> Signed-off-by: Xiaogang Chen<Xiaogang.Chen at amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 ++++++++----
>>> 1 file changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> index ce5ca304dba9..6b4be7893dfb 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> @@ -2524,11 +2524,15 @@ int amdgpu_amdkfd_evict_userptr(struct
>>> mmu_interval_notifier *mni,
>>> /* First eviction, stop the queues */
>>> r = kgd2kfd_quiesce_mm(mni->mm,
>>> KFD_QUEUE_EVICTION_TRIGGER_USERPTR);
>>> - if (r)
>>> +
>>> + if (r && r != -ESRCH)
>>> pr_err("Failed to quiesce KFD\n");
>>> - queue_delayed_work(system_freezable_wq,
>>> - &process_info->restore_userptr_work,
>>> - msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
>>> +
>>> + if (r != -ESRCH) {
>>
>> Braces are not required because there is only one statement inside
>> the if.
>>
>> Regards,
>> Felix
>>
>>
>>> + queue_delayed_work(system_freezable_wq,
>>> + &process_info->restore_userptr_work,
>>> + msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
>>> + }
>>> }
>>> mutex_unlock(&process_info->notifier_lock);
More information about the amd-gfx
mailing list