[PATCH v2 00/11] Recover from failure to probe GPU

Alex Deucher alexdeucher at gmail.com
Thu Dec 29 17:31:03 UTC 2022


Patches 1-10 are:
Reviewed-by: Alex Deucher <alexander.deucher at amd.com>

On Wed, Dec 28, 2022 at 11:31 AM Mario Limonciello
<mario.limonciello at amd.com> wrote:
>
> One of the first thing that KMS drivers do during initialization is
> destroy the system firmware framebuffer by means of
> `drm_aperture_remove_conflicting_pci_framebuffers`
>
> This means that if for any reason the GPU failed to probe the user
> will be stuck with at best a screen frozen at the last thing that
> was shown before the KMS driver continued it's probe.
>
> The problem is most pronounced when new GPU support is introduced
> because users will need to have a recent linux-firmware snapshot
> on their system when they boot a kernel with matching support.
>
> However the problem is further exaggerated in the case of amdgpu because
> it has migrated to "IP discovery" where amdgpu will attempt to load
> on "ALL" AMD GPUs even if the driver is missing support for IP blocks
> contained in that GPU.
>
> IP discovery requires some probing and isn't run until after the
> framebuffer has been destroyed.
>
> This means a situation can occur where a user purchases a new GPU not
> yet supported by a distribution and when booting the installer it will
> "freeze" even if the distribution doesn't have the matching kernel support
> for those IP blocks.
>
> The perfect example of this is Ubuntu 22.10 and the new dGPUs just
> launched by AMD.  The installation media ships with kernel 5.19 (which
> has IP discovery) but the amdgpu support for those IP blocks landed in
> kernel 6.0. The matching linux-firmware was released after 22.10's launch.
> The screen will freeze without nomodeset. Even if a user manages to install
> and then upgrades to kernel 6.0 after install they'll still have the
> problem of missing firmware, and the same experience.
>
> This is quite jarring for users, particularly if they don't know
> that they have to use "nomodeset" to install.
>
> To help the situation make changes to GPU discovery:
> 1) Delay releasing the firmware framebuffer until after IP discovery has
> completed.  This will help the situation of an older kernel that doesn't
> yet support the IP blocks probing a new GPU.
> 2) Request loading all PSP, VCN, SDMA, MES and GC microcode into memory
> during IP discovery. This will help the situation of new enough kernel for
> the IP discovery phase to otherwise pass but missing microcode from
> linux-firmware.git.
>
> Not all requested firmware will be loaded during IP discovery as some of it
> will require larger driver architecture changes. For example SMU firmware
> isn't loaded on certain products, but that's not known until later on when
> the early_init phase of the SMU load occurs.
>
> v1->v2:
>  * Take the suggestion from v1 thread to delay the framebuffer release until
>    ip discovery is done. This patch is CC to stable to that older stable
>    kernels with IP discovery won't try to probe unknown IP.
>  * Drop changes to drm aperature.
>  * Fetch SDMA, VCN, MES, GC and PSP microcode during IP discovery.
>
> Mario Limonciello (11):
>   drm/amd: Delay removal of the firmware framebuffer
>   drm/amd: Add a legacy mapping to "amdgpu_ucode_ip_version_decode"
>   drm/amd: Convert SMUv11 microcode init to use
>     `amdgpu_ucode_ip_version_decode`
>   drm/amd: Convert SMU v13 to use `amdgpu_ucode_ip_version_decode`
>   drm/amd: Request SDMA microcode during IP discovery
>   drm/amd: Request VCN microcode during IP discovery
>   drm/amd: Request MES microcode during IP discovery
>   drm/amd: Request GFX9 microcode during IP discovery
>   drm/amd: Request GFX10 microcode during IP discovery
>   drm/amd: Request GFX11 microcode during IP discovery
>   drm/amd: Request PSP microcode during IP discovery
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |   8 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 590 +++++++++++++++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   6 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c       |   2 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c      |   9 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h      |   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c     | 208 ++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c       |  85 +--
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c        | 180 +-----
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |  64 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c         | 143 +----
>  drivers/gpu/drm/amd/amdgpu/mes_v10_1.c        |  28 -
>  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c        |  25 +-
>  drivers/gpu/drm/amd/amdgpu/psp_v10_0.c        | 106 +---
>  drivers/gpu/drm/amd/amdgpu/psp_v11_0.c        | 165 +----
>  drivers/gpu/drm/amd/amdgpu/psp_v12_0.c        | 102 +--
>  drivers/gpu/drm/amd/amdgpu/psp_v13_0.c        |  82 ---
>  drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.c      |  36 --
>  drivers/gpu/drm/amd/amdgpu/psp_v3_1.c         |  36 --
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c        |  61 +-
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c        |  42 +-
>  drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c        |  65 +-
>  drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c        |  30 +-
>  .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c    |  35 +-
>  .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c    |  12 +-
>  25 files changed, 919 insertions(+), 1203 deletions(-)
>
>
> base-commit: de9a71e391a92841582ca3008e7b127a0b8ccf41
> --
> 2.34.1
>


More information about the amd-gfx mailing list