[PATCH v7 00/45] Recover from failure to probe GPU

Alex Deucher alexdeucher at gmail.com
Thu Jan 5 17:35:25 UTC 2023


On Thu, Jan 5, 2023 at 12:02 PM Mario Limonciello
<mario.limonciello at amd.com> wrote:
>
> One of the first thing that KMS drivers do during initialization is
> destroy the system firmware framebuffer by means of
> `drm_aperture_remove_conflicting_pci_framebuffers`
>
> This means that if for any reason the GPU failed to probe the user
> will be stuck with at best a screen frozen at the last thing that
> was shown before the KMS driver continued it's probe.
>
> The problem is most pronounced when new GPU support is introduced
> because users will need to have a recent linux-firmware snapshot
> on their system when they boot a kernel with matching support.
>
> However the problem is further exaggerated in the case of amdgpu because
> it has migrated to "IP discovery" where amdgpu will attempt to load
> on "ALL" AMD GPUs even if the driver is missing support for IP blocks
> contained in that GPU.
>
> IP discovery requires some probing and isn't run until after the
> framebuffer has been destroyed.
>
> This means a situation can occur where a user purchases a new GPU not
> yet supported by a distribution and when booting the installer it will
> "freeze" even if the distribution doesn't have the matching kernel support
> for those IP blocks.
>
> The perfect example of this is Ubuntu 22.10 and the new dGPUs just
> launched by AMD.  The installation media ships with kernel 5.19 (which
> has IP discovery) but the amdgpu support for those IP blocks landed in
> kernel 6.0. The matching linux-firmware was released after 22.10's launch.
> The screen will freeze without nomodeset. Even if a user manages to install
> and then upgrades to kernel 6.0 after install they'll still have the
> problem of missing firmware, and the same experience.
>
> This is quite jarring for users, particularly if they don't know
> that they have to use "nomodeset" to install.
>
> To help the situation make changes to GPU discovery:
> 1) Delay releasing the firmware framebuffer until after early_init
> completed.  This will help the situation of an older kernel that doesn't
> yet support the IP blocks probing a new GPU. IP discovery will have failed.
> 2) Request loading all PSP, VCN, SDMA, SMU, DMCUB, MES and GC microcode
> into memory during early_init. This will help the situation of new enough
> kernel for the IP discovery phase to otherwise pass but missing microcode
> from linux-firmware.git.

Series is:
Reviewed-by: Alex Deucher <alexander.deucher at amd.com>

>
> v6->v7:
>  * Pick up tags
>  * Fix PSP TAv1 handling to match previous behavior (securedisplay_context
>    only is set on PSPv10 and PSPv12/Renoir)
> v5->v6:
>  * Fix arguments for amdgpu_ucode_release to allow clearing pointer
>  * Fix whitespace mistake in VCN
>  * Pick up tags
> v4->v5:
>  * Rename amdgpu_ucode_load to amdgpu_ucode_request
>  * Add and utilize amdgpu_ucode_release throughout existing patches
>  * Update all amdgpu code to stop using request_firmware and
>    release_firmware for microcode
>  * Drop export of amdgpu_ucode_validate outside of amdgpu_ucode.c
>  * Pick up relevant tags for some patches
> v3->v4:
>  * Rework to delay framebuffer release until early_init is done
>  * Make IP load microcode during early init phase
>  * Add SMU and DMCUB checks for early_init loading
>  * Add some new helper code for wrapping request_firmware calls (needed for
>    early_init to return something besides -ENOENT)
> v2->v3:
>  * Pick up tags for patches 1-10
>  * Rework patch 11 to not validate during discovery
>  * Fix bugs with GFX9 due to gfx.num_gfx_rings not being set during
>    discovery
>  * Fix naming scheme for SDMA on dGPUs
> v1->v2:
>  * Take the suggestion from v1 thread to delay the framebuffer release
>    until ip discovery is done. This patch is CC to stable to that older
>    stable kernels with IP discovery won't try to probe unknown IP.
>  * Drop changes to drm aperature.
>  * Fetch SDMA, VCN, MES, GC and PSP microcode during IP discovery.
>
> Mario Limonciello (27):
>   drm/amd: Delay removal of the firmware framebuffer
>   drm/amd: Add a legacy mapping to "amdgpu_ucode_ip_version_decode"
>   drm/amd: Convert SMUv11 microcode to use
>     `amdgpu_ucode_ip_version_decode`
>   drm/amd: Convert SMUv13 microcode to use
>     `amdgpu_ucode_ip_version_decode`
>   drm/amd: Add a new helper for loading/validating microcode
>   drm/amd: Use `amdgpu_ucode_request` helper for SDMA
>   drm/amd: Convert SDMA to use `amdgpu_ucode_ip_version_decode`
>   drm/amd: Make SDMA firmware load failures less noisy.
>   drm/amd: Use `amdgpu_ucode_*` helpers for VCN
>   drm/amd: Load VCN microcode during early_init
>   drm/amd: Load MES microcode during early_init
>   drm/amd: Use `amdgpu_ucode_*` helpers for MES
>   drm/amd: Remove superfluous assignment for `adev->mes.adev`
>   drm/amd: Use `amdgpu_ucode_*` helpers for GFX9
>   drm/amd: Load GFX9 microcode during early_init
>   drm/amd: Use `amdgpu_ucode_*` helpers for GFX10
>   drm/amd: Load GFX10 microcode during early_init
>   drm/amd: Use `amdgpu_ucode_*` helpers for GFX11
>   drm/amd: Load GFX11 microcode during early_init
>   drm/amd: Parse both v1 and v2 TA microcode headers using same function
>   drm/amd: Avoid BUG() for case of SRIOV missing IP version
>   drm/amd: Load PSP microcode during early_init
>   drm/amd: Use `amdgpu_ucode_*` helpers for PSP
>   drm/amd/display: Load DMUB microcode during early_init
>   drm/amd: Use `amdgpu_ucode_release` helper for DMUB
>   drm/amd: Use `amdgpu_ucode_*` helpers for SMU
>   drm/amd: Load SMU microcode during early_init
>   drm/amd: Optimize SRIOV switch/case for PSP microcode load
>   drm/amd: Use `amdgpu_ucode_*` helpers for GFX6
>   drm/amd: Use `amdgpu_ucode_*` helpers for GFX7
>   drm/amd: Use `amdgpu_ucode_*` helpers for GFX8
>   drm/amd: Use `amdgpu_ucode_*` helpers for GMC6
>   drm/amd: Use `amdgpu_ucode_*` helpers for GMC7
>   drm/amd: Use `amdgpu_ucode_*` helpers for GMC8
>   drm/amd: Use `amdgpu_ucode_*` helpers for SDMA2.4
>   drm/amd: Use `amdgpu_ucode_*` helpers for SDMA3.0
>   drm/amd: Use `amdgpu_ucode_*` helpers for SDMA on CIK
>   drm/amd: Use `amdgpu_ucode_*` helpers for UVD
>   drm/amd: Use `amdgpu_ucode_*` helpers for VCE
>   drm/amd: Use `amdgpu_ucode_*` helpers for CGS
>   drm/amd: Use `amdgpu_ucode_*` helpers for GPU info bin
>   drm/amd: Use `amdgpu_ucode_*` helpers for DMCU
>   drm/amd: Use `amdgpu_ucode_release` helper for powerplay
>   drm/amd: Use `amdgpu_ucode_release` helper for si
>   drm/amd: make amdgpu_ucode_validate static
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c       |  11 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  22 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   6 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c       |  59 ++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h       |   1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c       | 299 +++++++++---------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c      |  25 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h      |   4 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c     | 259 ++++++++++++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h     |   4 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c       |  14 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c       |  14 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c       |  65 +---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h       |   1 +
>  drivers/gpu/drm/amd/amdgpu/cik_sdma.c         |  16 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c        | 155 +++------
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        | 124 +++-----
>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c         |  30 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c         |  68 +---
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c         |  94 ++----
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c         | 140 ++------
>  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c         |  14 +-
>  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c         |  13 +-
>  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c         |  13 +-
>  drivers/gpu/drm/amd/amdgpu/imu_v11_0.c        |   7 +-
>  drivers/gpu/drm/amd/amdgpu/mes_v10_1.c        | 108 ++-----
>  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c        |  99 ++----
>  drivers/gpu/drm/amd/amdgpu/psp_v10_0.c        |  80 +----
>  drivers/gpu/drm/amd/amdgpu/psp_v11_0.c        | 131 +-------
>  drivers/gpu/drm/amd/amdgpu/psp_v12_0.c        |  79 +----
>  drivers/gpu/drm/amd/amdgpu/psp_v13_0.c        |  27 +-
>  drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.c      |  14 +-
>  drivers/gpu/drm/amd/amdgpu/psp_v3_1.c         |  16 +-
>  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c        |  18 +-
>  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c        |  18 +-
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c        |  47 +--
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c        |  30 +-
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c        |  55 +---
>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c        |  25 +-
>  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c         |   5 +-
>  drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c         |   5 +-
>  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c         |   5 +-
>  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c         |   5 +-
>  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c         |   5 +-
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 110 ++++---
>  drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c    |  11 +-
>  .../gpu/drm/amd/pm/powerplay/amd_powerplay.c  |   3 +-
>  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c     |  12 +-
>  .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c    |  51 +--
>  .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c    |  28 +-
>  50 files changed, 900 insertions(+), 1545 deletions(-)
>
> --
> 2.34.1
>


More information about the amd-gfx mailing list