[PATCH v3 7/7] drm/amdgpu: Stop any pending reset if another in progress.

Felix Kuehling felix.kuehling at amd.com
Tue May 31 15:35:34 UTC 2022


Am 2022-05-31 um 11:31 schrieb Felix Kuehling:
> Am 2022-05-25 um 15:04 schrieb Andrey Grodzovsky:
>> We skip rest requests if another one is already in progress.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 27 ++++++++++++++++++++++
>>   1 file changed, 27 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 424571e46cf5..e1f7ee604ea4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -5054,6 +5054,27 @@ static void amdgpu_device_recheck_guilty_jobs(
>>       }
>>   }
>>   +static inline void amdggpu_device_stop_pedning_resets(struct 
>> amdgpu_device* adev)
>
> Typo: pedning -> pending
>
>
>> +{
>> +    struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
>> +
>> +#if defined(CONFIG_DEBUG_FS)
>> +    if (!amdgpu_sriov_vf(adev))
>> +        cancel_work(&adev->reset_work);
>> +#endif
>> +
>> +    if (adev->kfd.dev)
>> +        cancel_work(&adev->kfd.reset_work);
>
> Do you also need to cancel resets from other GPUs in the same hive?

Never mind. I see this is called in a loop over the GPUs in 
amdgpu_device_gpu_recover.

Other than the typo, this patch is

Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>


>
> Regards,
>   Felix
>
>
>> +
>> +    if (amdgpu_sriov_vf(adev))
>> +        cancel_work(&adev->virt.flr_work);
>> +
>> +    if (con && adev->ras_enabled)
>> +        cancel_work(&con->recovery_work);
>> +
>> +}
>> +
>> +
>>   /**
>>    * amdgpu_device_gpu_recover - reset the asic and recover scheduler
>>    *
>> @@ -5209,6 +5230,12 @@ int amdgpu_device_gpu_recover(struct 
>> amdgpu_device *adev,
>>                     r, adev_to_drm(tmp_adev)->unique);
>>               tmp_adev->asic_reset_res = r;
>>           }
>> +
>> +        /*
>> +         * Drop all pending non scheduler resets. Scheduler resets
>> +         * were already dropped during drm_sched_stop
>> +         */
>> +        amdggpu_device_stop_pedning_resets(tmp_adev);
>>       }
>>         tmp_vram_lost_counter = 
>> atomic_read(&((adev)->vram_lost_counter));


More information about the amd-gfx mailing list