[PATCH v4 8/8] Revert "PCI/ERR: Update error status after reset_link()"

Kuppuswamy, Sathyanarayanan sathyanarayanan.kuppuswamy at linux.intel.com
Wed Sep 2 19:00:05 UTC 2020



On 9/2/20 11:42 AM, Andrey Grodzovsky wrote:
> This reverts commit 6d2c89441571ea534d6240f7724f518936c44f8d.
> 
> In the code bellow
> 
>                  pci_walk_bus(bus, report_frozen_detected, &status);
> -               if (reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED)
> +               status = reset_link(dev, service);
> 
> status returned from report_frozen_detected is unconditionally masked
> by status returned from reset_link which is wrong.
> 
> This breaks error recovery implementation for AMDGPU driver
> by masking PCI_ERS_RESULT_NEED_RESET returned from amdgpu_pci_error_detected
> and hence skiping slot reset callback which is necessary for proper
> ASIC recovery. Effectively no other callback besides resume callback will
> be called after link reset the way it is implemented now regardless of what
> value error_detected callback returns.
> 
	}

Instead of reverting this change, can you try following patch ?
https://lore.kernel.org/linux-pci/56ad4901-725f-7b88-2117-b124b28b027f@linux.intel.com/T/#me8029c04f63c21f9d1cb3b1ba2aeffbca3a60df5

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


More information about the amd-gfx mailing list