[PATCH] drm/amdgpu: Fix crash when hot unplug in BACO.

Alex Deucher alexdeucher at gmail.com
Fri May 21 20:18:18 UTC 2021


On Fri, May 21, 2021 at 4:14 PM Andrey Grodzovsky
<andrey.grodzovsky at amd.com> wrote:
>
> Problem:
> When device goes into sleep state due to prolonged
> innactivity (e.g. BACO sleep) and then hot unplugged,
> PCI core will try to wake up the device as part of
> unplug process. Since the device is gone all HW
> programming during rpm resume fails leading
> to a bad SW state later during pci remove handling.
>
> Fix:
> Use a flag we use for PCIe error recovery to avoid
> accessing registres. This allows to succefully complete
> rpm resume sequence and finish pci remove.

Might make sense to create a preliminary patch to change the name of
this flag to something like no_hw_access since it's not specific to
pci error handling.

Alex

>
> P.S Must use pci_device_is_present and not drm_dev_enter/exit
> here since rpm resume happens before PCI remove and so the
> unplug flag is not set yet.
>
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1081
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index d8db5929cdd9..ab95ebf56636 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1555,6 +1555,11 @@ static int amdgpu_pmops_runtime_resume(struct device *dev)
>         if (!adev->runpm)
>                 return -EINVAL;
>
> +       /* Avoids registers access if device is physically gone */
> +       if (!pci_device_is_present(adev->pdev))
> +               adev->in_pci_err_recovery = true;
> +
> +
>         if (amdgpu_device_supports_px(drm_dev)) {
>                 drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
>
> --
> 2.25.1
>


More information about the dri-devel mailing list