[PATCH 1/1] amdgpu/soc15: enable asic reset for dGPU in case of suspend abort

Lazar, Lijo lijo.lazar at amd.com
Wed Jan 22 03:24:14 UTC 2025



On 1/22/2025 8:28 AM, Mario Limonciello wrote:
> On 1/20/2025 23:45, Lazar, Lijo wrote:
>>
>>
>> On 1/13/2025 9:10 AM, Jiang Liu wrote:
>>> When GPU suspend is aborted, do the same for dGPU as APU to reset
>>> soc15 asic. Otherwise it may cause following errors:
>>> [  547.229463] amdgpu 0001:81:00.0: [drm:amdgpu_ring_test_helper
>>> [amdgpu]] *ERROR* ring kiq_0.2.1.0 test failed (-110)
>>>
>>> [  555.126827] amdgpu 0000:0a:00.0: [drm:amdgpu_ring_test_helper
>>> [amdgpu]] *ERROR* ring kiq_0.2.1.0 test failed (-110)
>>> [  555.126901] [drm:amdgpu_gfx_enable_kcq [amdgpu]] *ERROR* KCQ
>>> enable failed
>>> [  555.126957] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR*
>>> resume of IP block <gfx_v9_4_3> failed -110
>>> [  555.126959] amdgpu 0000:0a:00.0: amdgpu: amdgpu_device_ip_resume
>>> failed (-110).
>>> [  555.126965] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns
>>> -110
>>> [  555.126966] PM: Device 0000:0a:00.0 failed to resume async: error
>>> -110
>>>
>>> This fix has been tested on Mi308X.
>>>
>>> Signed-off-by: Jiang Liu <gerry at linux.alibaba.com>
>>> Tested-by: Shuo Liu <shuox.liu at linux.alibaba.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/soc15.c | 8 +++-----
>>>   1 file changed, 3 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/
>>> amd/amdgpu/soc15.c
>>> index a59b4c36cad7..0e1daefd1a8e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> @@ -605,12 +605,10 @@ soc15_asic_reset_method(struct amdgpu_device
>>> *adev)
>>>   static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
>>>   {
>>>       /* Will reset for the following suspend abort cases.
>>> -     * 1) Only reset on APU side, dGPU hasn't checked yet.
>>> -     * 2) S3 suspend aborted in the normal S3 suspend or
>>> -     *    performing pm core test.
>>> +     * 1) S3 suspend aborted in the normal S3 suspend
>>> +     * 2) S3 suspend aborted in performing pm core test.
>>>        */
>>> -    if (adev->flags & AMD_IS_APU && adev->in_s3 &&
>>> -            !pm_resume_via_firmware())
>>> +    if (adev->in_s3 && !pm_resume_via_firmware())
>>>           return true;
>>
>> I don't think this can be applied to all environments. For ex: this may
>> not be applicable for dGPUs combined with ARM CPUs.
> 
> I looked through amdgpu_acpi cases and I'm not sure I agree with this.
> On ARM side amdgpu_acpi_is_s3_active() should never return true because
> ARM doesn't support PM_SUSPEND_MEM.
> 
> This means that amdgpu_choose_low_power_state() shouldn't set is_s3 either.
> 
> So I don't think this block will run on ARM side.

ARM is not the fundamental issue. It's for any platform which supports
suspend/resume outside of ACPI. The original patch is accepted only for
APUs for that reason.

Thanks,
Lijo

> 
> That being said, we might have other more fundamental issues to worry
> about with suspend/resume than handling aborted suspend/resume when run
> on non-x86 so the whole set of amdgpu suspend/resume code might need to
> be revisited.



More information about the amd-gfx mailing list