[PATCH 00/10] Support XGMI reset on init
Christian König
ckoenig.leichtzumerken at gmail.com
Wed Sep 4 12:47:59 UTC 2024
Am 02.09.24 um 09:34 schrieb Lijo Lazar:
> There are case where a device needs to be reset first before it is fully
> initialized. An example case is a driver reinstallation with a different version
> of PSP TOS. In such a case, if a device supports reset in which PSP TOS is
> unloaded, then driver needs to reset device first and then load the new firmware
> components.
>
> For devices in an XGMI hive, a reset needs to be sent on all devices in the
> hive. Thus driver should discover first devices that belong to a hive with
> PSP support.
>
> There is an existing delayed reset handler, however it has the below
> limitations-
> 1) It doesn't discover devices in the hive, instead it tries to do XGMI reset
> for all devices registered to mgpu struct. mgpu struct may have other devices
> than the one which belong to a hive. Also, if there is more than one hive, it
> doesn't work.
> 2) It doesn't take a reset lock and since this is a delayed reset, that could
> result in unwanted hardware accesses during a reset.
> 3) It doesn't initialize RAS properly (left as TODO)
>
> This series overcomes the above limitations. Instead of marking a pending reset,
> init levels are defined where the level of initialization may be defined. In
> case of a pending reset, only specific hardware blocks may be initialized.
>
> Further work (not done in this series) may be done to have fine grain controls
> for init levels - say skip enabling features like DPM enablement, or skip
> loading specific set of fimwares as they won't be required during a minimal init
> scenario where device is going to be reset.
>
> The series adds an API interface to check if a PSP TOS reload is required.
At least from the high level that sounds totally sane, but I have no
idea where to get time from to review the details.
I need to discuss that with Alex and/or Tim. Maybe I can delegate some
more work.
Christian.
>
>
> Lijo Lazar (10):
> drm/amdgpu: Add init levels
> drm/amdgpu: Use init level for pending_reset flag
> drm/amdgpu: Separate reinitialization after reset
> drm/amdgpu: Add reset on init handler for XGMI
> drm/amdgpu: Add helper to initialize badpage info
> drm/amdgpu: Refactor XGMI reset on init handling
> drm/amdgpu: Drop delayed reset work handler
> drm/amdgpu: Support reset-on-init on select SOCs
> drm/amdgpu: Add interface for TOS reload cases
> drm/amdgpu: Add PSP reload case to reset-on-init
>
> drivers/gpu/drm/amd/amdgpu/aldebaran.c | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 21 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 245 +++++++++++-------
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 81 ------
> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 -
> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 13 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 3 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 62 +++--
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 148 +++++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 4 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 72 ++++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 2 +
> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 14 +-
> drivers/gpu/drm/amd/amdgpu/psp_v13_0.c | 25 ++
> drivers/gpu/drm/amd/amdgpu/soc15.c | 7 +
> .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 3 +-
> 17 files changed, 492 insertions(+), 214 deletions(-)
>
More information about the amd-gfx
mailing list