[PATCH 2/2] drm/amdgpu: Fix resume failures when device is gone

Alex Deucher alexdeucher at gmail.com
Mon Sep 20 16:12:10 UTC 2021


Series is:
Reviewed-by: Alex Deucher <alexander.deucher at amd.com>

On Fri, Sep 17, 2021 at 7:31 AM Andrey Grodzovsky
<andrey.grodzovsky at amd.com> wrote:
>
> Problem:
> When device goes into suspend and unplugged during it
> then all HW programming during resume fails leading
> to a bad SW during pci remove handling which follows.
> Because device is first resumed and only later removed
> we cannot rely on drm_dev_enter/exit here.
>
> Fix:
> Use a flag we use for PCIe error recovery to avoid
> accessing registres. This allows to successfully complete
> pm resume sequence and finish pci remove.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index db21af5e84ed..04fb4e74fb20 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1522,6 +1522,10 @@ static int amdgpu_pmops_resume(struct device *dev)
>         struct amdgpu_device *adev = drm_to_adev(drm_dev);
>         int r;
>
> +       /* Avoids registers access if device is physically gone */
> +       if (!pci_device_is_present(adev->pdev))
> +               adev->no_hw_access = true;
> +
>         r = amdgpu_device_resume(drm_dev, true);
>         if (amdgpu_acpi_is_s0ix_active(adev))
>                 adev->in_s0ix = false;
> --
> 2.25.1
>


More information about the amd-gfx mailing list