[PATCH v2 00/11] Recover from failure to probe GPU

Lazar, Lijo lijo.lazar at amd.com
Tue Jan 3 10:10:26 UTC 2023



On 12/28/2022 10:00 PM, Mario Limonciello wrote:
> One of the first thing that KMS drivers do during initialization is
> destroy the system firmware framebuffer by means of
> `drm_aperture_remove_conflicting_pci_framebuffers`
> 
> This means that if for any reason the GPU failed to probe the user
> will be stuck with at best a screen frozen at the last thing that
> was shown before the KMS driver continued it's probe.
> 
> The problem is most pronounced when new GPU support is introduced
> because users will need to have a recent linux-firmware snapshot
> on their system when they boot a kernel with matching support.
> 
> However the problem is further exaggerated in the case of amdgpu because
> it has migrated to "IP discovery" where amdgpu will attempt to load
> on "ALL" AMD GPUs even if the driver is missing support for IP blocks
> contained in that GPU.
> 
> IP discovery requires some probing and isn't run until after the
> framebuffer has been destroyed.
> 
> This means a situation can occur where a user purchases a new GPU not
> yet supported by a distribution and when booting the installer it will
> "freeze" even if the distribution doesn't have the matching kernel support
> for those IP blocks.
> 
> The perfect example of this is Ubuntu 22.10 and the new dGPUs just
> launched by AMD.  The installation media ships with kernel 5.19 (which
> has IP discovery) but the amdgpu support for those IP blocks landed in
> kernel 6.0. The matching linux-firmware was released after 22.10's launch.
> The screen will freeze without nomodeset. Even if a user manages to install
> and then upgrades to kernel 6.0 after install they'll still have the
> problem of missing firmware, and the same experience.
> 
> This is quite jarring for users, particularly if they don't know
> that they have to use "nomodeset" to install.
> 
> To help the situation make changes to GPU discovery:
> 1) Delay releasing the firmware framebuffer until after IP discovery has
> completed.  This will help the situation of an older kernel that doesn't
> yet support the IP blocks probing a new GPU.
> 2) Request loading all PSP, VCN, SDMA, MES and GC microcode into memory
> during IP discovery. This will help the situation of new enough kernel for
> the IP discovery phase to otherwise pass but missing microcode from
> linux-firmware.git.
> 
> Not all requested firmware will be loaded during IP discovery as some of it
> will require larger driver architecture changes. For example SMU firmware
> isn't loaded on certain products, but that's not known until later on when
> the early_init phase of the SMU load occurs.
> 
> v1->v2:
>   * Take the suggestion from v1 thread to delay the framebuffer release until
>     ip discovery is done. This patch is CC to stable to that older stable
>     kernels with IP discovery won't try to probe unknown IP.
>   * Drop changes to drm aperature.
>   * Fetch SDMA, VCN, MES, GC and PSP microcode during IP discovery.
> 

What is the gain here in just checking if firmware files are available? 
It can fail anywhere during sw_init and it's the same situation.

Restricting IP FWs to IP specific files looks better to me than 
centralizing and creating interdependencies.

Thanks,
Lijo

> Mario Limonciello (11):
>    drm/amd: Delay removal of the firmware framebuffer
>    drm/amd: Add a legacy mapping to "amdgpu_ucode_ip_version_decode"
>    drm/amd: Convert SMUv11 microcode init to use
>      `amdgpu_ucode_ip_version_decode`
>    drm/amd: Convert SMU v13 to use `amdgpu_ucode_ip_version_decode`
>    drm/amd: Request SDMA microcode during IP discovery
>    drm/amd: Request VCN microcode during IP discovery
>    drm/amd: Request MES microcode during IP discovery
>    drm/amd: Request GFX9 microcode during IP discovery
>    drm/amd: Request GFX10 microcode during IP discovery
>    drm/amd: Request GFX11 microcode during IP discovery
>    drm/amd: Request PSP microcode during IP discovery
> 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |   8 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 590 +++++++++++++++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   6 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c       |   2 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c      |   9 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h      |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c     | 208 ++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c       |  85 +--
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c        | 180 +-----
>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c        |  64 +-
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c         | 143 +----
>   drivers/gpu/drm/amd/amdgpu/mes_v10_1.c        |  28 -
>   drivers/gpu/drm/amd/amdgpu/mes_v11_0.c        |  25 +-
>   drivers/gpu/drm/amd/amdgpu/psp_v10_0.c        | 106 +---
>   drivers/gpu/drm/amd/amdgpu/psp_v11_0.c        | 165 +----
>   drivers/gpu/drm/amd/amdgpu/psp_v12_0.c        | 102 +--
>   drivers/gpu/drm/amd/amdgpu/psp_v13_0.c        |  82 ---
>   drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.c      |  36 --
>   drivers/gpu/drm/amd/amdgpu/psp_v3_1.c         |  36 --
>   drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c        |  61 +-
>   drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c        |  42 +-
>   drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c        |  65 +-
>   drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c        |  30 +-
>   .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c    |  35 +-
>   .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c    |  12 +-
>   25 files changed, 919 insertions(+), 1203 deletions(-)
> 
> 
> base-commit: de9a71e391a92841582ca3008e7b127a0b8ccf41


More information about the amd-gfx mailing list