[PATCH] drm/amdgpu: Fix crash when hot unplug in BACO.

Andrey Grodzovsky andrey.grodzovsky at amd.com
Fri May 21 20:37:09 UTC 2021


Will do.

Andrey

On 2021-05-21 4:18 p.m., Alex Deucher wrote:
> On Fri, May 21, 2021 at 4:14 PM Andrey Grodzovsky
> <andrey.grodzovsky at amd.com> wrote:
>> Problem:
>> When device goes into sleep state due to prolonged
>> innactivity (e.g. BACO sleep) and then hot unplugged,
>> PCI core will try to wake up the device as part of
>> unplug process. Since the device is gone all HW
>> programming during rpm resume fails leading
>> to a bad SW state later during pci remove handling.
>>
>> Fix:
>> Use a flag we use for PCIe error recovery to avoid
>> accessing registres. This allows to succefully complete
>> rpm resume sequence and finish pci remove.
> Might make sense to create a preliminary patch to change the name of
> this flag to something like no_hw_access since it's not specific to
> pci error handling.
>
> Alex
>
>> P.S Must use pci_device_is_present and not drm_dev_enter/exit
>> here since rpm resume happens before PCI remove and so the
>> unplug flag is not set yet.
>>
>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1081&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C2a0ec02245b64de0139808d91c959987%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637572251118922092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=egcYBxBU%2BkIqbEdypVueXQcWb%2Bqe%2BKCC30Mw%2FjgR6ag%3D&reserved=0
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index d8db5929cdd9..ab95ebf56636 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -1555,6 +1555,11 @@ static int amdgpu_pmops_runtime_resume(struct device *dev)
>>          if (!adev->runpm)
>>                  return -EINVAL;
>>
>> +       /* Avoids registers access if device is physically gone */
>> +       if (!pci_device_is_present(adev->pdev))
>> +               adev->in_pci_err_recovery = true;
>> +
>> +
>>          if (amdgpu_device_supports_px(drm_dev)) {
>>                  drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
>>
>> --
>> 2.25.1
>>


More information about the dri-devel mailing list