[PATCH v2] drm/amdkfd: Not restore userptr buffer if kfd process has been removed

Chen, Xiaogang xiaogang.chen at amd.com
Wed Oct 9 22:50:15 UTC 2024


On 10/9/2024 4:36 PM, Felix Kuehling wrote:
>
> On 2024-10-04 11:54, Xiaogang.Chen wrote:
>> From: Xiaogang Chen <xiaogang.chen at amd.com>
>>
>> When kfd process has been terminated not restore userptr buffer after 
>> mmu
>> notifier invalidates a range.
>
> Is this fixing a real problem or a hypothetical problem? If there is a 
> real problem, can you include more information here? It looks to me 
> that amdgpu_amdkfd_restore_userptr_worker is already handling the 
> cases where a process or mm_struct no longer exists. Maybe the only 
> user visible change from this patch is, that you no longer print 
> "Failed to quiesce KFD" in a corner case of an MMU notifier for a 
> process that no longer exists?
>
> Or is there a problem with the lifetime of the process_info that 
> contains the work_struct?
>
My thought was that restore userptr buffer is not needed anyway if the 
correspondent kfd process has been removed. I noticed that during 
another work. I do not see if still restoring userptr buffer would cause 
issue when the kfd process not exist. Think avoid doing it is safer, and 
not see there is regression.

Regards

Xiaogang

>
>>
>> Signed-off-by: Xiaogang Chen<Xiaogang.Chen at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 ++++++++----
>>   1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index ce5ca304dba9..6b4be7893dfb 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -2524,11 +2524,15 @@ int amdgpu_amdkfd_evict_userptr(struct 
>> mmu_interval_notifier *mni,
>>           /* First eviction, stop the queues */
>>           r = kgd2kfd_quiesce_mm(mni->mm,
>>                          KFD_QUEUE_EVICTION_TRIGGER_USERPTR);
>> -        if (r)
>> +
>> +        if (r && r != -ESRCH)
>>               pr_err("Failed to quiesce KFD\n");
>> -        queue_delayed_work(system_freezable_wq,
>> -            &process_info->restore_userptr_work,
>> -            msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
>> +
>> +        if (r != -ESRCH) {
>
> Braces are not required because there is only one statement inside the 
> if.
>
> Regards,
>   Felix
>
>
>> + queue_delayed_work(system_freezable_wq,
>> +                &process_info->restore_userptr_work,
>> + msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
>> +        }
>>       }
>>       mutex_unlock(&process_info->notifier_lock);


More information about the amd-gfx mailing list