[PATCH 1/2] drm/amdgpu: use CPU to update VM during GPU reset

Christian König christian.koenig at amd.com
Tue Apr 2 11:40:00 UTC 2024


Am 02.04.24 um 10:47 schrieb Yu, Lang:
> [AMD Official Use Only - General]
>
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig at amd.com>
>> Sent: Friday, March 29, 2024 7:08 PM
>> To: Yu, Lang <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
>> Subject: Re: [PATCH 1/2] drm/amdgpu: use CPU to update VM during GPU
>> reset
>>
>> Am 25.03.24 um 06:35 schrieb Lang Yu:
>>> drm sched is stopped and SDMA mode is not available, while CPU mode
>>> worked well in such a case.
>>>
>>> Use case,
>>> amdgpu_do_asic_reset
>>> amdgpu_device_ip_late_init
>>> umsch_mm_late_init
>>> umsch_mm_test
>>> amdgpu_vm_init
>> Well big NAK to that.
>>
>> The VM updates should just be scheduled and applied as soon as the GPU
>> reset is completed.
>>
>> The problem is rather that a GPU reset should *never* create a VM to do a
>> test. During GPU reset no memory allocation whatsoever is allowed.
> But user space can still create a VM via open("/dev/dri/card0", ...) during GPU reset,
> driver doesn't prevent user space from doing that. So is this reasonable? Thanks.

Yes the UMD can still create VMs during reset, but this is completely 
unproblematic since all submissions will wait till after the reset 
before they start executing.

This includes both VM updates as well as userspace submissions.

Regards,
Christian.

>
> Regards,
> Lang
>
>> That's why we only do IB and ring tests with a pre-allocated memory pool
>> during a GPU reset.
>>
>> If the umsch_mm_test abuses the VM tests like this then please remove that
>> code immediately.
>>
>> Regards,
>> Christian.
>>
>>> Signed-off-by: Lang Yu <Lang.Yu at amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 8af3f0fd3073..af53f9cfcc40 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -2404,8 +2404,8 @@ int amdgpu_vm_init(struct amdgpu_device
>> *adev,
>>> struct amdgpu_vm *vm,
>>>
>>>       vm->is_compute_context = false;
>>>
>>> -    vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode
>> &
>>> -                                AMDGPU_VM_USE_CPU_FOR_GFX);
>>> +    vm->use_cpu_for_update = !!(amdgpu_in_reset(adev) ||
>>> +            adev->vm_manager.vm_update_mode &
>> AMDGPU_VM_USE_CPU_FOR_GFX);
>>>       DRM_DEBUG_DRIVER("VM update mode is %s\n",
>>>                        vm->use_cpu_for_update ? "CPU" : "SDMA");



More information about the amd-gfx mailing list