[PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
Deucher, Alexander
Alexander.Deucher at amd.com
Wed Apr 19 13:16:39 UTC 2023
[Public]
> -----Original Message-----
> From: Guilherme G. Piccoli <gpiccoli at igalia.com>
> Sent: Tuesday, April 18, 2023 6:15 PM
> To: stable at vger.kernel.org
> Cc: gregkh at linuxfoundation.org; sashal at kernel.org; amd-
> gfx at lists.freedesktop.org; Deucher, Alexander
> <Alexander.Deucher at amd.com>; Zhu, James <James.Zhu at amd.com>; Liu,
> Leo <Leo.Liu at amd.com>; kernel at gpiccoli.net; kernel-dev at igalia.com
> Subject: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh
> broken BIOSes
>
> commit 542a56e8eb4467ae654eefab31ff194569db39cd upstream.
>
> The VCN firmware loading path enables the indirect SRAM mode if it's
> advertised as supported. We might have some cases of FW issues that
> prevents this mode to working properly though, ending-up in a failed probe.
> An example below, observed in the Steam Deck:
>
> [...]
> [drm] failed to load ucode VCN0_RAM(0x3A) [drm] psp gfx command
> LOAD_IP_FW(0x6) failed and response status is (0xFFFF0000) amdgpu
> 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
> vcn_dec_0 test failed (-110) [drm:amdgpu_device_init.cold [amdgpu]]
> *ERROR* hw_init of IP block <vcn_v3_0> failed -110 amdgpu 0000:04:00.0:
> amdgpu: amdgpu_device_ip_init failed amdgpu 0000:04:00.0: amdgpu: Fatal
> error during GPU init [...]
>
> Disabling the VCN block circumvents this, but it's a very invasive workaround
> that turns off the entire feature. So, let's add a quirk on VCN loading that
> checks for known problematic BIOSes on Vangogh, so we can proactively
> disable the indirect SRAM mode and allow the HW proper probe and VCN IP
> block to work fine.
>
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385
> Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram
> mode")
> Fixes: 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect sram
> mode")
> Cc: stable at vger.kernel.org
> Cc: James Zhu <James.Zhu at amd.com>
> Cc: Leo Liu <leo.liu at amd.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli at igalia.com>
> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
> ---
>
> Hi folks, this was build/boot tested on Deck. I've also adjusted the context,
> function was reworked on 6.2.
>
> But what a surprise was for me not see this fix already in 6.1.y, since I've
> CCed stable, and the reason for that is really peculiar:
>
>
> $ git log -1 --pretty="%an <%ae>: %s" 82132ecc5432 Leo Liu
> <leo.liu at amd.com>: drm/amdgpu: enable Vangogh VCN indirect sram mode
>
> $ git describe --contains 82132ecc5432
> v6.2-rc1~124^2~1^2~13
>
>
> $ git log -1 --pretty="%an <%ae>: %s" 9a8cc8cabc1e Leo Liu
> <leo.liu at amd.com>: drm/amdgpu: enable Vangogh VCN indirect sram mode
>
> $ git describe --contains 9a8cc8cabc1e
> v6.1-rc8~16^2^2
>
>
> This is quite strange for me, we have 2 commit hashes pointing to the
> *same* commit, and each one is present..in a different release !!?!
> Since I've marked this patch as fixing 82132ecc5432 originally, 6.1.y stable
> misses it, since it only contains 9a8cc8cabc1e (which is the same patch!).
>
> Alex, do you have an idea why sometimes commits from the AMD tree
> appear duplicate in mainline? Specially in different releases, this could cause
> some confusion I guess.
This is pretty common in the drm. The problem is there is not a good way to get bug fixes into both -next and -fixes without constant back merges. So the patches end up in -next and if they are important for stable, they go to -fixes too. We don't want -next to be broken, but we can't wait until the next kernel cycle to get the bug fixes into the current kernel and stable. I don't know of a good way to make this smoother.
Alex
>
> Thanks in advance,
>
>
> Guilherme
>
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> index ce64ca1c6e66..5c1193dd7d88 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> @@ -26,6 +26,7 @@
>
> #include <linux/firmware.h>
> #include <linux/module.h>
> +#include <linux/dmi.h>
> #include <linux/pci.h>
> #include <linux/debugfs.h>
> #include <drm/drm_drv.h>
> @@ -84,6 +85,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
> {
> unsigned long bo_size;
> const char *fw_name;
> + const char *bios_ver;
> const struct common_firmware_header *hdr;
> unsigned char fw_check;
> unsigned int fw_shared_size, log_offset; @@ -159,6 +161,21 @@ int
> amdgpu_vcn_sw_init(struct amdgpu_device *adev)
> if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)
> &&
> (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> adev->vcn.indirect_sram = true;
> + /*
> + * Some Steam Deck's BIOS versions are incompatible with
> the
> + * indirect SRAM mode, leading to amdgpu being unable to
> get
> + * properly probed (and even potentially crashing the
> kernel).
> + * Hence, check for these versions here - notice this is
> + * restricted to Vangogh (Deck's APU).
> + */
> + bios_ver = dmi_get_system_info(DMI_BIOS_VERSION);
> +
> + if (bios_ver && (!strncmp("F7A0113", bios_ver, 7) ||
> + !strncmp("F7A0114", bios_ver, 7))) {
> + adev->vcn.indirect_sram = false;
> + dev_info(adev->dev,
> + "Steam Deck quirk: indirect SRAM disabled on
> BIOS %s\n", bios_ver);
> + }
> break;
> case IP_VERSION(3, 0, 16):
> fw_name = FIRMWARE_DIMGREY_CAVEFISH;
> --
> 2.40.0
More information about the amd-gfx
mailing list