[PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes

Deucher, Alexander Alexander.Deucher at amd.com
Wed Apr 19 13:16:39 UTC 2023


[Public]

> -----Original Message-----
> From: Guilherme G. Piccoli <gpiccoli at igalia.com>
> Sent: Tuesday, April 18, 2023 6:15 PM
> To: stable at vger.kernel.org
> Cc: gregkh at linuxfoundation.org; sashal at kernel.org; amd-
> gfx at lists.freedesktop.org; Deucher, Alexander
> <Alexander.Deucher at amd.com>; Zhu, James <James.Zhu at amd.com>; Liu,
> Leo <Leo.Liu at amd.com>; kernel at gpiccoli.net; kernel-dev at igalia.com
> Subject: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh
> broken BIOSes
> 
> commit 542a56e8eb4467ae654eefab31ff194569db39cd upstream.
> 
> The VCN firmware loading path enables the indirect SRAM mode if it's
> advertised as supported. We might have some cases of FW issues that
> prevents this mode to working properly though, ending-up in a failed probe.
> An example below, observed in the Steam Deck:
> 
> [...]
> [drm] failed to load ucode VCN0_RAM(0x3A) [drm] psp gfx command
> LOAD_IP_FW(0x6) failed and response status is (0xFFFF0000) amdgpu
> 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
> vcn_dec_0 test failed (-110) [drm:amdgpu_device_init.cold [amdgpu]]
> *ERROR* hw_init of IP block <vcn_v3_0> failed -110 amdgpu 0000:04:00.0:
> amdgpu: amdgpu_device_ip_init failed amdgpu 0000:04:00.0: amdgpu: Fatal
> error during GPU init [...]
> 
> Disabling the VCN block circumvents this, but it's a very invasive workaround
> that turns off the entire feature. So, let's add a quirk on VCN loading that
> checks for known problematic BIOSes on Vangogh, so we can proactively
> disable the indirect SRAM mode and allow the HW proper probe and VCN IP
> block to work fine.
> 
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385
> Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram
> mode")
> Fixes: 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect sram
> mode")
> Cc: stable at vger.kernel.org
> Cc: James Zhu <James.Zhu at amd.com>
> Cc: Leo Liu <leo.liu at amd.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli at igalia.com>
> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
> ---
> 
> Hi folks, this was build/boot tested on Deck. I've also adjusted the context,
> function was reworked on 6.2.
> 
> But what a surprise was for me not see this fix already in 6.1.y, since I've
> CCed stable, and the reason for that is really peculiar:
> 
> 
> $ git log -1 --pretty="%an <%ae>: %s" 82132ecc5432 Leo Liu
> <leo.liu at amd.com>: drm/amdgpu: enable Vangogh VCN indirect sram mode
> 
> $ git describe --contains 82132ecc5432
> v6.2-rc1~124^2~1^2~13
> 
> 
> $ git log -1 --pretty="%an <%ae>: %s" 9a8cc8cabc1e Leo Liu
> <leo.liu at amd.com>: drm/amdgpu: enable Vangogh VCN indirect sram mode
> 
> $ git describe --contains 9a8cc8cabc1e
> v6.1-rc8~16^2^2
> 
> 
> This is quite strange for me, we have 2 commit hashes pointing to the
> *same* commit, and each one is present..in a different release !!?!
> Since I've marked this patch as fixing 82132ecc5432 originally, 6.1.y stable
> misses it, since it only contains 9a8cc8cabc1e (which is the same patch!).
> 
> Alex, do you have an idea why sometimes commits from the AMD tree
> appear duplicate in mainline? Specially in different releases, this could cause
> some confusion I guess.

This is pretty common in the drm.  The problem is there is not a good way to get bug fixes into both -next and -fixes without constant back merges.  So the patches end up in -next and if they are important for stable, they go to -fixes too.  We don't want -next to be broken, but we can't wait until the next kernel cycle to get the bug fixes into the current kernel and stable.  I don't know of a good way to make this smoother.

Alex


> 
> Thanks in advance,
> 
> 
> Guilherme
> 
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> index ce64ca1c6e66..5c1193dd7d88 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> @@ -26,6 +26,7 @@
> 
>  #include <linux/firmware.h>
>  #include <linux/module.h>
> +#include <linux/dmi.h>
>  #include <linux/pci.h>
>  #include <linux/debugfs.h>
>  #include <drm/drm_drv.h>
> @@ -84,6 +85,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
> {
>  	unsigned long bo_size;
>  	const char *fw_name;
> +	const char *bios_ver;
>  	const struct common_firmware_header *hdr;
>  	unsigned char fw_check;
>  	unsigned int fw_shared_size, log_offset; @@ -159,6 +161,21 @@ int
> amdgpu_vcn_sw_init(struct amdgpu_device *adev)
>  		if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)
> &&
>  		    (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
>  			adev->vcn.indirect_sram = true;
> +		/*
> +		 * Some Steam Deck's BIOS versions are incompatible with
> the
> +		 * indirect SRAM mode, leading to amdgpu being unable to
> get
> +		 * properly probed (and even potentially crashing the
> kernel).
> +		 * Hence, check for these versions here - notice this is
> +		 * restricted to Vangogh (Deck's APU).
> +		 */
> +		bios_ver = dmi_get_system_info(DMI_BIOS_VERSION);
> +
> +		if (bios_ver && (!strncmp("F7A0113", bios_ver, 7) ||
> +		     !strncmp("F7A0114", bios_ver, 7))) {
> +			adev->vcn.indirect_sram = false;
> +			dev_info(adev->dev,
> +				"Steam Deck quirk: indirect SRAM disabled on
> BIOS %s\n", bios_ver);
> +		}
>  		break;
>  	case IP_VERSION(3, 0, 16):
>  		fw_name = FIRMWARE_DIMGREY_CAVEFISH;
> --
> 2.40.0


More information about the amd-gfx mailing list