[PATCH] drm/amdgpu: Fix two reset triggered in a row
Felix Kuehling
felix.kuehling at amd.com
Tue Apr 23 18:05:15 UTC 2024
On 2024-04-23 01:50, Christian König wrote:
> Am 22.04.24 um 21:45 schrieb Yunxiang Li:
>> Reset request from KFD is missing a check for if a reset is already in
>> progress, this causes a second reset to be triggered right after the
>> previous one finishes. Add the check to align with the other reset
>> sources.
>
> NAK, that isn't how this should be handled.
>
> Instead all reset source which are handled by a previous reset should
> be canceled.
>
> In other words there should be a cancel_work(&adev->kfd.reset_work);
> somewhere in the KFD code. When this doesn't work correctly then that
> is somehow missing.
>
> If you see the use of amdgpu_in_reset() outside of the low level
> functions than that is clearly a bug.
Do we need to do that for all reset workers in the driver separately? I
don't see where this is done for other reset workers.
Regards,
Felix
>
> Regards,
> Christian.
>
>>
>> Signed-off-by: Yunxiang Li <Yunxiang.Li at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> index 3b4591f554f1..ce3dbb1cc2da 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> @@ -283,7 +283,7 @@ int amdgpu_amdkfd_post_reset(struct amdgpu_device
>> *adev)
>> void amdgpu_amdkfd_gpu_reset(struct amdgpu_device *adev)
>> {
>> - if (amdgpu_device_should_recover_gpu(adev))
>> + if (amdgpu_device_should_recover_gpu(adev) &&
>> !amdgpu_in_reset(adev))
>> amdgpu_reset_domain_schedule(adev->reset_domain,
>> &adev->kfd.reset_work);
>> }
>
More information about the amd-gfx
mailing list