[PATCH 1/1] amdgpu/soc15: enable asic reset for dGPU in case of suspend abort

Mario Limonciello mario.limonciello at amd.com
Wed Jan 22 02:58:48 UTC 2025


On 1/20/2025 23:45, Lazar, Lijo wrote:
> 
> 
> On 1/13/2025 9:10 AM, Jiang Liu wrote:
>> When GPU suspend is aborted, do the same for dGPU as APU to reset
>> soc15 asic. Otherwise it may cause following errors:
>> [  547.229463] amdgpu 0001:81:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_0.2.1.0 test failed (-110)
>>
>> [  555.126827] amdgpu 0000:0a:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_0.2.1.0 test failed (-110)
>> [  555.126901] [drm:amdgpu_gfx_enable_kcq [amdgpu]] *ERROR* KCQ enable failed
>> [  555.126957] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_4_3> failed -110
>> [  555.126959] amdgpu 0000:0a:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
>> [  555.126965] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -110
>> [  555.126966] PM: Device 0000:0a:00.0 failed to resume async: error -110
>>
>> This fix has been tested on Mi308X.
>>
>> Signed-off-by: Jiang Liu <gerry at linux.alibaba.com>
>> Tested-by: Shuo Liu <shuox.liu at linux.alibaba.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/soc15.c | 8 +++-----
>>   1 file changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
>> index a59b4c36cad7..0e1daefd1a8e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>> @@ -605,12 +605,10 @@ soc15_asic_reset_method(struct amdgpu_device *adev)
>>   static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
>>   {
>>   	/* Will reset for the following suspend abort cases.
>> -	 * 1) Only reset on APU side, dGPU hasn't checked yet.
>> -	 * 2) S3 suspend aborted in the normal S3 suspend or
>> -	 *    performing pm core test.
>> +	 * 1) S3 suspend aborted in the normal S3 suspend
>> +	 * 2) S3 suspend aborted in performing pm core test.
>>   	 */
>> -	if (adev->flags & AMD_IS_APU && adev->in_s3 &&
>> -			!pm_resume_via_firmware())
>> +	if (adev->in_s3 && !pm_resume_via_firmware())
>>   		return true;
> 
> I don't think this can be applied to all environments. For ex: this may
> not be applicable for dGPUs combined with ARM CPUs.

I looked through amdgpu_acpi cases and I'm not sure I agree with this.
On ARM side amdgpu_acpi_is_s3_active() should never return true because 
ARM doesn't support PM_SUSPEND_MEM.

This means that amdgpu_choose_low_power_state() shouldn't set is_s3 either.

So I don't think this block will run on ARM side.

That being said, we might have other more fundamental issues to worry 
about with suspend/resume than handling aborted suspend/resume when run 
on non-x86 so the whole set of amdgpu suspend/resume code might need to 
be revisited.


More information about the amd-gfx mailing list