[PATCH v2 2/2] drm/amdgpu: clean up the suspend_complete

Lazar, Lijo lijo.lazar at amd.com
Thu Oct 24 07:30:28 UTC 2024



On 10/24/2024 12:49 PM, Lazar, Lijo wrote:
> 
> 
> On 10/24/2024 12:23 PM, Liang, Prike wrote:
>> [AMD Official Use Only - AMD Internal Distribution Only]
>>
>>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
>>> Sent: Thursday, October 24, 2024 11:39 AM
>>> To: Liang, Prike <Prike.Liang at amd.com>; amd-gfx at lists.freedesktop.org
>>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
>>> Subject: Re: [PATCH v2 2/2] drm/amdgpu: clean up the suspend_complete
>>>
>>>
>>>
>>> On 10/24/2024 8:24 AM, Liang, Prike wrote:
>>>> [Public]
>>>>
>>>>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
>>>>> Sent: Wednesday, October 23, 2024 6:55 PM
>>>>> To: Liang, Prike <Prike.Liang at amd.com>; amd-gfx at lists.freedesktop.org
>>>>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
>>>>> Subject: Re: [PATCH v2 2/2] drm/amdgpu: clean up the suspend_complete
>>>>>
>>>>>
>>>>>
>>>>> On 10/14/2024 1:19 PM, Prike Liang wrote:
>>>>>> To check the status of S3 suspend completion, use the PM core
>>>>>> pm_suspend_global_flags bit(1) to detect S3 abort events. Therefore,
>>>>>> clean up the AMDGPU driver's private flag suspend_complete.
>>>>>>
>>>>>> Signed-off-by: Prike Liang <Prike.Liang at amd.com>
>>>>>> ---
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu.h     | 2 --
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 --
>>>>>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 4 ++--
>>>>>>  drivers/gpu/drm/amd/amdgpu/soc15.c      | 7 ++-----
>>>>>>  drivers/gpu/drm/amd/amdgpu/soc21.c      | 2 +-
>>>>>>  5 files changed, 5 insertions(+), 12 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> index 48c9b9b06905..9b35763ae0a7 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> @@ -1111,8 +1111,6 @@ struct amdgpu_device {
>>>>>>     bool                            in_s3;
>>>>>>     bool                            in_s4;
>>>>>>     bool                            in_s0ix;
>>>>>> -   /* indicate amdgpu suspension status */
>>>>>> -   bool                            suspend_complete;
>>>>>>
>>>>>>     enum pp_mp1_state               mp1_state;
>>>>>>     struct amdgpu_doorbell_index doorbell_index; diff --git
>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> index 680e44fdee6e..78972151b970 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> @@ -2501,7 +2501,6 @@ static int amdgpu_pmops_suspend(struct device
>>> *dev)
>>>>>>     struct drm_device *drm_dev = dev_get_drvdata(dev);
>>>>>>     struct amdgpu_device *adev = drm_to_adev(drm_dev);
>>>>>>
>>>>>> -   adev->suspend_complete = false;
>>>>>>     if (amdgpu_acpi_is_s0ix_active(adev))
>>>>>>             adev->in_s0ix = true;
>>>>>>     else if (amdgpu_acpi_is_s3_active(adev)) @@ -2516,7 +2515,6 @@
>>>>>> static int amdgpu_pmops_suspend_noirq(struct device *dev)
>>>>>>     struct drm_device *drm_dev = dev_get_drvdata(dev);
>>>>>>     struct amdgpu_device *adev = drm_to_adev(drm_dev);
>>>>>>
>>>>>> -   adev->suspend_complete = true;
>>>>>>     if (amdgpu_acpi_should_gpu_reset(adev))
>>>>>>             return amdgpu_asic_reset(adev);
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>>> index be320d753507..ba8e66744376 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>>> @@ -3276,8 +3276,8 @@ static int gfx_v9_0_cp_gfx_start(struct
>>>>>> amdgpu_device
>>>>> *adev)
>>>>>>      * confirmed that the APU gfx10/gfx11 needn't such update.
>>>>>>      */
>>>>>>     if (adev->flags & AMD_IS_APU &&
>>>>>> -                   adev->in_s3 && !adev->suspend_complete) {
>>>>>> -           DRM_INFO(" Will skip the CSB packet resubmit\n");
>>>>>> +                   adev->in_s3 && !pm_resume_via_firmware()) {
>>>>>> +           DRM_INFO("Will skip the CSB packet resubmit\n");
>>>>>>             return 0;
>>>>>>     }
>>>>>>     r = amdgpu_ring_alloc(ring, gfx_v9_0_get_csb_size(adev) + 4 +
>>>>>> 3); diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>>>> index 12ff6cf568dc..d9d11131a744 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>>>> @@ -584,13 +584,10 @@ static bool soc15_need_reset_on_resume(struct
>>>>> amdgpu_device *adev)
>>>>>>      *    performing pm core test.
>>>>>>      */
>>>>>>     if (adev->flags & AMD_IS_APU && adev->in_s3 &&
>>>>>> -                   !pm_resume_via_firmware()) {
>>>>>> -           adev->suspend_complete = false;
>>>>>> +                   !pm_resume_via_firmware())
>>>>>>             return true;
>>>>>> -   } else {
>>>>>> -           adev->suspend_complete = true;
>>>>>> +   else
>>>>>>             return false;
>>>>>> -   }
>>>>>>  }
>>>>>>
>>>>>>  static int soc15_asic_reset(struct amdgpu_device *adev) diff --git
>>>>>> a/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>>>> index c4b950e75133..7a47a21ef00f 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>>>> @@ -904,7 +904,7 @@ static bool soc21_need_reset_on_resume(struct
>>>>> amdgpu_device *adev)
>>>>>>      * 2) S3 suspend got aborted and TOS is active.
>>>>>>      */
>>>>>>     if (!(adev->flags & AMD_IS_APU) && adev->in_s3 &&
>>>>>> -       !adev->suspend_complete) {
>>>>>> +       !pm_resume_via_firmware()) {
>>>>>
>>>>> Looks like this will cover only ACPI based systems. Not sure if that
>>>>> assumption is valid for dGPU cases.
>>>>>
>>>>> Thanks,
>>>>> Lijo
>>>>
>>>> Yes, the pm_set_resume_via_firmware() function is only called during the
>>> ACPI_STATE_S3 suspend process. However, ACPI-enabled systems are popular in
>>> the desktop world. If there are concerns about ACPI configuration, one option could
>>> be to check if the dGPU needs a reset by directly checking the SOL register. As far
>>> as I can see, when the dGPU completes its suspend process, the SOL value will
>>> remain zero until the dGPU is resumed. Conversely, in the case of a suspend abort,
>>> the SOL value will be non-zero.
>>>>
>>>
>>> in_s3 is set for dGPU in case of s0ix as well. Probably, that's the only case where
>>> need the flag to avoid unnecessary reset. Otherwise SOL check could be sufficient.
>>>
>>> Thanks,
>>> Lijo
>>>
>> Do you mean we need to include S0ix to reset the dGPU during an S0ix suspend abort? However, the in_s0ix state of the dGPU should always be false, and there is no specific suspension handler for the dGPU in S0ix. As a PCIe endpoint, the dGPU should be powered off during system-wide(S0ix and Sx) suspend, and the SOL will be reset to 0 during the suspend process. So, for the dGPU resume case, do you think it's enough to detect the suspend abort event by only checking SOL without any Sx filter?
>>
> For S0ix, I don't think there is a requirement to turn off all endpoints
> - for dGPUs that don't support D3 or runpm etc. 

To clarify, "don't support D3" => what I meant is when D3 entry is
prevented.

Thanks,
Lijo

Then even if S0ix entry
> got aborted, but GPU is suspended properly there is no need to reset the
> device.
> 
> Thanks,
> Lijo
> 
>>>> Thanks,
>>>> Prike
>>>>>
>>>>>>             sol_reg1 = RREG32_SOC15(MP0, 0, regMP0_SMN_C2PMSG_81);
>>>>>>             msleep(100);
>>>>>>             sol_reg2 = RREG32_SOC15(MP0, 0, regMP0_SMN_C2PMSG_81);


More information about the amd-gfx mailing list