[PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO
Alex Deucher
alexdeucher at gmail.com
Thu Mar 30 02:08:09 UTC 2023
On Wed, Mar 29, 2023 at 8:49 PM Kai-Heng Feng
<kai.heng.feng at canonical.com> wrote:
>
> On Wed, Mar 29, 2023 at 9:21 PM Alex Deucher <alexdeucher at gmail.com> wrote:
> >
> > On Wed, Mar 29, 2023 at 6:00 AM Kai-Heng Feng
> > <kai.heng.feng at canonical.com> wrote:
> > >
> > > When the power is lost due to ACPI power resources being turned off, the
> > > driver should reset the GPU so it can work anew.
> > >
> > > First, _PR3 support of the hierarchy needs to be found correctly. Since
> > > the GPU on some discrete GFX cards is behind a PCIe switch, checking the
> > > _PR3 on downstream port alone is not enough, as the _PR3 can associate
> > > to the root port above the PCIe switch.
> > >
> > > Once the _PR3 is found and BOCO support is correctly marked, use that
> > > information to inform the GPU should be reset. This solves an issue that
> > > system freeze on a Intel ADL desktop that uses S0ix for sleep and D3cold
> > > is supported for the GFX slot.
> >
> > I don't think we need to reset the GPU. If the power is turned off, a
> > reset shouldn't be necessary. The reset is only necessary when the
> > power is not turned off to put the GPU into a known good state. It
> > should be in that state already if the power is turn off. It sounds
> > like the device is not actually getting powered off.
>
> I had the impression that the GPU gets reset because S3 turned the
> power rail off.
>
> So the actual intention for GPU reset is because S3 doesn't guarantee
> the power is being turned off?
For S4, the reset in freeze is there because once the boot kernel
transitions to the hibernated kernel, we need the reset to bring the
GPU back to a known state. On dGPUs at least there are some engines
that can only be initialized once and then require a reset to be
initialized again. The one in suspend was originally there to deal
with aborted suspends where we'd need to reset the GPU for the same
reason as S4. However, it no longer really serves much purpose since
it got moved to noirq and it could probably be dropped.
Alex
>
> Kai-Heng
>
> >
> > Alex
> >
> > >
> > > Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
> > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458
> > > Signed-off-by: Kai-Heng Feng <kai.heng.feng at canonical.com>
> > > ---
> > > drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 3 +++
> > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 ++++++-
> > > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 12 +++++-------
> > > 3 files changed, 14 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > index 60b1857f469e..407456ac0e84 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > @@ -987,6 +987,9 @@ bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev)
> > > if (amdgpu_sriov_vf(adev))
> > > return false;
> > >
> > > + if (amdgpu_device_supports_boco(adev_to_drm(adev)))
> > > + return true;
> > > +
> > > #if IS_ENABLED(CONFIG_SUSPEND)
> > > return pm_suspend_target_state != PM_SUSPEND_TO_IDLE;
> > > #else
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > index f5658359ff5c..d56b7a2bafa6 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > @@ -2181,7 +2181,12 @@ static int amdgpu_device_ip_early_init(struct amdgpu_device *adev)
> > >
> > > if (!(adev->flags & AMD_IS_APU)) {
> > > parent = pci_upstream_bridge(adev->pdev);
> > > - adev->has_pr3 = parent ? pci_pr3_present(parent) : false;
> > > + do {
> > > + if (pci_pr3_present(parent)) {
> > > + adev->has_pr3 = true;
> > > + break;
> > > + }
> > > + } while ((parent = pci_upstream_bridge(parent)));
> > > }
> > >
> > > amdgpu_amdkfd_device_probe(adev);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > index ba5def374368..5d81fcac4b0a 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > @@ -2415,10 +2415,11 @@ static int amdgpu_pmops_suspend(struct device *dev)
> > > struct drm_device *drm_dev = dev_get_drvdata(dev);
> > > struct amdgpu_device *adev = drm_to_adev(drm_dev);
> > >
> > > - if (amdgpu_acpi_is_s0ix_active(adev))
> > > - adev->in_s0ix = true;
> > > - else if (amdgpu_acpi_is_s3_active(adev))
> > > + if (amdgpu_acpi_is_s3_active(adev) ||
> > > + amdgpu_device_supports_boco(drm_dev))
> > > adev->in_s3 = true;
> > > + else if (amdgpu_acpi_is_s0ix_active(adev))
> > > + adev->in_s0ix = true;
> > > if (!adev->in_s0ix && !adev->in_s3)
> > > return 0;
> > > return amdgpu_device_suspend(drm_dev, true);
> > > @@ -2449,10 +2450,7 @@ static int amdgpu_pmops_resume(struct device *dev)
> > > adev->no_hw_access = true;
> > >
> > > r = amdgpu_device_resume(drm_dev, true);
> > > - if (amdgpu_acpi_is_s0ix_active(adev))
> > > - adev->in_s0ix = false;
> > > - else
> > > - adev->in_s3 = false;
> > > + adev->in_s0ix = adev->in_s3 = false;
> > > return r;
> > > }
> > >
> > > --
> > > 2.34.1
> > >
More information about the amd-gfx
mailing list