[PATCH v2 00/11] Recover from failure to probe GPU
Lazar, Lijo
lijo.lazar at amd.com
Tue Jan 3 10:10:26 UTC 2023
On 12/28/2022 10:00 PM, Mario Limonciello wrote:
> One of the first thing that KMS drivers do during initialization is
> destroy the system firmware framebuffer by means of
> `drm_aperture_remove_conflicting_pci_framebuffers`
>
> This means that if for any reason the GPU failed to probe the user
> will be stuck with at best a screen frozen at the last thing that
> was shown before the KMS driver continued it's probe.
>
> The problem is most pronounced when new GPU support is introduced
> because users will need to have a recent linux-firmware snapshot
> on their system when they boot a kernel with matching support.
>
> However the problem is further exaggerated in the case of amdgpu because
> it has migrated to "IP discovery" where amdgpu will attempt to load
> on "ALL" AMD GPUs even if the driver is missing support for IP blocks
> contained in that GPU.
>
> IP discovery requires some probing and isn't run until after the
> framebuffer has been destroyed.
>
> This means a situation can occur where a user purchases a new GPU not
> yet supported by a distribution and when booting the installer it will
> "freeze" even if the distribution doesn't have the matching kernel support
> for those IP blocks.
>
> The perfect example of this is Ubuntu 22.10 and the new dGPUs just
> launched by AMD. The installation media ships with kernel 5.19 (which
> has IP discovery) but the amdgpu support for those IP blocks landed in
> kernel 6.0. The matching linux-firmware was released after 22.10's launch.
> The screen will freeze without nomodeset. Even if a user manages to install
> and then upgrades to kernel 6.0 after install they'll still have the
> problem of missing firmware, and the same experience.
>
> This is quite jarring for users, particularly if they don't know
> that they have to use "nomodeset" to install.
>
> To help the situation make changes to GPU discovery:
> 1) Delay releasing the firmware framebuffer until after IP discovery has
> completed. This will help the situation of an older kernel that doesn't
> yet support the IP blocks probing a new GPU.
> 2) Request loading all PSP, VCN, SDMA, MES and GC microcode into memory
> during IP discovery. This will help the situation of new enough kernel for
> the IP discovery phase to otherwise pass but missing microcode from
> linux-firmware.git.
>
> Not all requested firmware will be loaded during IP discovery as some of it
> will require larger driver architecture changes. For example SMU firmware
> isn't loaded on certain products, but that's not known until later on when
> the early_init phase of the SMU load occurs.
>
> v1->v2:
> * Take the suggestion from v1 thread to delay the framebuffer release until
> ip discovery is done. This patch is CC to stable to that older stable
> kernels with IP discovery won't try to probe unknown IP.
> * Drop changes to drm aperature.
> * Fetch SDMA, VCN, MES, GC and PSP microcode during IP discovery.
>
What is the gain here in just checking if firmware files are available?
It can fail anywhere during sw_init and it's the same situation.
Restricting IP FWs to IP specific files looks better to me than
centralizing and creating interdependencies.
Thanks,
Lijo
> Mario Limonciello (11):
> drm/amd: Delay removal of the firmware framebuffer
> drm/amd: Add a legacy mapping to "amdgpu_ucode_ip_version_decode"
> drm/amd: Convert SMUv11 microcode init to use
> `amdgpu_ucode_ip_version_decode`
> drm/amd: Convert SMU v13 to use `amdgpu_ucode_ip_version_decode`
> drm/amd: Request SDMA microcode during IP discovery
> drm/amd: Request VCN microcode during IP discovery
> drm/amd: Request MES microcode during IP discovery
> drm/amd: Request GFX9 microcode during IP discovery
> drm/amd: Request GFX10 microcode during IP discovery
> drm/amd: Request GFX11 microcode during IP discovery
> drm/amd: Request PSP microcode during IP discovery
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 590 +++++++++++++++++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 -
> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 -
> drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 9 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 208 ++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 85 +--
> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 180 +-----
> drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 64 +-
> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 143 +----
> drivers/gpu/drm/amd/amdgpu/mes_v10_1.c | 28 -
> drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 25 +-
> drivers/gpu/drm/amd/amdgpu/psp_v10_0.c | 106 +---
> drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 165 +----
> drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 102 +--
> drivers/gpu/drm/amd/amdgpu/psp_v13_0.c | 82 ---
> drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.c | 36 --
> drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 36 --
> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 61 +-
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 42 +-
> drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 65 +-
> drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 30 +-
> .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 35 +-
> .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 12 +-
> 25 files changed, 919 insertions(+), 1203 deletions(-)
>
>
> base-commit: de9a71e391a92841582ca3008e7b127a0b8ccf41
More information about the amd-gfx
mailing list