[PATCH v3 0/8] Implement PCI Error Recovery on Navi12

Nirmoy nirmodas at amd.com
Mon Aug 31 16:47:01 UTC 2020


Hi Andrey,


I need to understand more about pci saved state. So excluding patch 5 
the series is Acked-by: Nirmoy Das <nirmoy.das at amd.com>.



Regards,

Nirmoy


On 8/31/20 5:50 PM, Andrey Grodzovsky wrote:
> Many PCI bus controllers are able to detect a variety of hardware PCI errors on the bus,
> such as parity errors on the data and address buses,  A typical action taken is to disconnect
> the affected device, halting all I/O to it. Typically, a reconnection mechanism is also offered,
> so that the affected PCI device(s) are reset and put back into working condition.
> In our case the reconnection mechanism is facilitated by kernel Downstream Port Containment (DPC)
> driver which will intercept the PCIe error, remove (isolate) the faulting device after which it
> will call into PCIe recovery code of the PCI core.
> This code will call hooks which are implemented in this patchset where the error is
> first reported at which point we block the GPU scheduler, next DPC resets the
> PCI link which generates HW interrupt which is intercepted by SMU/PSP who
> start executing mode1 reset of the ASIC, next step is slot reset hook is called
> at which point we wait for ASIC reset to complete, restore PCI config space and run
> HW suspend/resume sequence to resinit the ASIC.
> Last hook called is resume normal operation at which point we will restart the GPU scheduler.
>
> Andrey Grodzovsky (8):
>    drm/amdgpu: Implement DPC recovery
>    drm/amdgpu: Avoid accessing HW when suspending SW state
>    drm/amdgpu: Block all job scheduling activity during DPC recovery
>    drm/amdgpu: Fix SMU error failure
>    drm/amdgpu: Fix consecutive DPC recovery failures.
>    drm/amdgpu: Trim amdgpu_pci_slot_reset by reusing code.
>    drm/amdgpu: Disable DPC for XGMI for now.
>    drm/amdgpu: Minor checkpatch fix
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  16 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 298 ++++++++++++++++++++++++++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  13 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    |   6 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c    |   6 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c     |  18 +-
>   drivers/gpu/drm/amd/amdgpu/nv.c            |   4 +-
>   drivers/gpu/drm/amd/amdgpu/soc15.c         |   4 +-
>   drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c     |   3 +
>   9 files changed, 346 insertions(+), 22 deletions(-)
>


More information about the amd-gfx mailing list