[PATCH v4 8/8] Revert "PCI/ERR: Update error status after reset_link()"
Kuppuswamy, Sathyanarayanan
sathyanarayanan.kuppuswamy at linux.intel.com
Wed Sep 2 19:00:05 UTC 2020
On 9/2/20 11:42 AM, Andrey Grodzovsky wrote:
> This reverts commit 6d2c89441571ea534d6240f7724f518936c44f8d.
>
> In the code bellow
>
> pci_walk_bus(bus, report_frozen_detected, &status);
> - if (reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED)
> + status = reset_link(dev, service);
>
> status returned from report_frozen_detected is unconditionally masked
> by status returned from reset_link which is wrong.
>
> This breaks error recovery implementation for AMDGPU driver
> by masking PCI_ERS_RESULT_NEED_RESET returned from amdgpu_pci_error_detected
> and hence skiping slot reset callback which is necessary for proper
> ASIC recovery. Effectively no other callback besides resume callback will
> be called after link reset the way it is implemented now regardless of what
> value error_detected callback returns.
>
}
Instead of reverting this change, can you try following patch ?
https://lore.kernel.org/linux-pci/56ad4901-725f-7b88-2117-b124b28b027f@linux.intel.com/T/#me8029c04f63c21f9d1cb3b1ba2aeffbca3a60df5
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
More information about the amd-gfx
mailing list