[PATCH v2] drm/amdkfd: not restore userptr buffer if kfd process has been removed
Felix Kuehling
felix.kuehling at amd.com
Fri Oct 4 14:46:03 UTC 2024
On 2024-10-04 10:39, Russell, Kent wrote:
> [Public]
>
> My mistake.
>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Russell,
>> Kent
>> Sent: Friday, October 4, 2024 9:53 AM
>> To: Chen, Xiaogang <Xiaogang.Chen at amd.com>; Kuehling, Felix
>> <Felix.Kuehling at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Yang, Philip <Philip.Yang at amd.com>
>> Subject: RE: [PATCH v2] drm/amdkfd: not restore userptr buffer if kfd process has
>> been removed
>>
>> [AMD Official Use Only - AMD Internal Distribution Only]
>>
>> [AMD Official Use Only - AMD Internal Distribution Only]
>>
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Chen,
>>> Xiaogang
>>> Sent: Thursday, October 3, 2024 6:11 PM
>>> To: Kuehling, Felix <Felix.Kuehling at amd.com>; amd-gfx at lists.freedesktop.org
>>> Cc: Yang, Philip <Philip.Yang at amd.com>
>>> Subject: Re: [PATCH v2] drm/amdkfd: not restore userptr buffer if kfd process has
>>> been removed
>>>
>>>
>>> On 10/3/2024 4:11 PM, Felix Kuehling wrote:
>>>>
>>>> On 2024-10-03 16:55, Xiaogang.Chen wrote:
>>>>> From: Xiaogang Chen <xiaogang.chen at amd.com>
>>>>>
>>>>> When kfd process has been terminated not restore userptr buffer after
>>>>> mmu notifier invalidates a range.
>>>>>
>>>>> Signed-off-by: Xiaogang Chen<Xiaogang.Chen at amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 ++++++++----
>>>>> 1 file changed, 8 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> index ce5ca304dba9..1df0926b63b3 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> @@ -2524,11 +2524,15 @@ int amdgpu_amdkfd_evict_userptr(struct
>>>>> mmu_interval_notifier *mni,
>>>>> /* First eviction, stop the queues */
>>>>> r = kgd2kfd_quiesce_mm(mni->mm,
>>>>> KFD_QUEUE_EVICTION_TRIGGER_USERPTR);
>>>>> - if (r)
>>>>> +
>>>>> + if (r && r != -ESRCH)
>>>>> pr_err("Failed to quiesce KFD\n");
>>>>> - queue_delayed_work(system_freezable_wq,
>>>>> - &process_info->restore_userptr_work,
>>>>> - msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
>>>>> +
>>>>> + if (!r || r != -ESRCH) {
>>>>
>>>> This condition is always true.
>>>>
>>> so sure why this condition is always true? kgd2kfd_quiesce_mm can
>>> return -ESRCH when it cannot find kfd process correspondent to mni->mm,
>>> then above checking will be false, then will not queue restore work item
>>> into system_freezable_wq.
>>
>> If you expand the 2 conditions, it becomes "if (r !=0 || r != -3)", which will always be
>> true for any value of r.
>>
> I got this wrong. So it's either r==0 or r==-3 (I need some caffeine). The function returns things back up from evict_queues, mqd_destroy, and can eventually return EIO or ETIME in the hqd_destroy function, so r can indeed be different values than 0/-3. Sorry for my confusion here.
It's (r == 0 || r != -3). So yeah, I was wrong, it's not always true. But it's still confusing because (r == 0) is redundant. I guess you could just write
if (r != -ESRCH) {
...
}
I don't know if that matches your intention.
Regards,
Felix
>
> Kent
>
>> Kent
>>
>>>
>>> Regards
>>>
>>> Xiaogang
>>>
>>>> Regards,
>>>> Felix
>>>>
>>>>
>>>>> + queue_delayed_work(system_freezable_wq,
>>>>> + &process_info->restore_userptr_work,
>>>>> + msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
>>>>> + }
>>>>> }
>>>>> mutex_unlock(&process_info->notifier_lock);
More information about the amd-gfx
mailing list