amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13
Lazar, Lijo
lijo.lazar at amd.com
Mon Oct 25 15:15:44 UTC 2021
On 10/25/2021 7:45 PM, Alex Deucher wrote:
> On Mon, Oct 25, 2021 at 9:48 AM PGNet Dev <pgnet.dev at gmail.com> wrote:
>>
>> ( cc'ing this here, OP -> dri-devel@ )
>>
>> i've a dual gpu system
>>
>> inxi -GS
>> System: Host: ws05 Kernel: 5.14.13-200.fc34.x86_64 x86_64 bits: 64 Console: tty pts/0
>> Distro: Fedora release 34 (Thirty Four)
>> (1) Graphics: Device-1: NVIDIA GK208B [GeForce GT 710] driver: nvidia v: 470.74
>> (2) Device-2: Advanced Micro Devices [AMD/ATI] Cezanne driver: N/A
>> Display: server: X.org 1.20.11 driver: loaded: nvidia unloaded: fbdev,modesetting,vesa
>> Message: Advanced graphics data unavailable for root.
>>
>> running on
>>
>> cpu: Ryzen 5 5600G
>> mobo: ASRockRack X470D4U
>> bios: vP4.20, 04/14/2021
>> kernel: 5.14.13-200.fc34.x86_64 x86_64
>>
>> where,
>>
>> the nvidia is a PCIe card
>> the amdgpu is the Ryzen-integrated gpu
>>
>> the nvidia PCI is currently my primary
>> it's screen-attached, and boots/functions correctly
>>
>> lsmod | grep nvidia
>> nvidia_drm 69632 0
>> nvidia_modeset 1200128 1 nvidia_drm
>> nvidia 35332096 1 nvidia_modeset
>> drm_kms_helper 303104 2 amdgpu,nvidia_drm
>> drm 630784 8 gpu_sched,drm_kms_helper,nvidia,amdgpu,drm_ttm_helper,nvidia_drm,ttm
>>
>> dmesg | grep -i nvidia
>> [ 5.755494] nvidia: loading out-of-tree module taints kernel.
>> [ 5.755503] nvidia: module license 'NVIDIA' taints kernel.
>> [ 5.759769] nvidia: module verification failed: signature and/or required key missing - tainting kernel
>> [ 5.774894] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
>> [ 5.775299] nvidia 0000:10:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
>> [ 5.975449] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 470.74 Mon Sep 13 23:09:15 UTC 2021
>> [ 6.013181] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 470.74 Mon Sep 13 22:59:50 UTC 2021
>> [ 6.016444] [drm] [nvidia-drm] [GPU ID 0x00001000] Loading driver
>> [ 6.227295] caller _nv000723rm+0x1ad/0x200 [nvidia] mapping multiple BARs
>> [ 6.954906] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:10:00.0 on minor 0
>> [ 16.820758] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input13
>> [ 16.820776] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input14
>> [ 16.820808] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input15
>> [ 16.820826] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input16
>> [ 16.820841] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input17
>>
>> the amdgpu is not (currently/yet) in use; no attached screen
>>
>> in BIOS, currently,
>>
>> 'PCI Express' (nvidia gpu) is selected as primary
>> 'HybridGraphics' is enabled
>> 'OnBoard VGA' is enabled
>>
>>
>> on boot, mods are loaded
>>
>> lsmod | grep gpu
>> amdgpu 7802880 0
>> drm_ttm_helper 16384 1 amdgpu
>> ttm 81920 2 amdgpu,drm_ttm_helper
>> iommu_v2 24576 1 amdgpu
>> gpu_sched 45056 1 amdgpu
>> drm_kms_helper 303104 2 amdgpu,nvidia_drm
>> drm 630784 8 gpu_sched,drm_kms_helper,nvidia,amdgpu,drm_ttm_helper,nvidia_drm,ttm
>> i2c_algo_bit 16384 2 igb,amdgpu
>>
>> but i see a 'fatal error' and 'failed' probe,
>>
>> dmesg | grep -i amdgpu
>> [ 5.161923] [drm] amdgpu kernel modesetting enabled.
>> [ 5.162097] amdgpu: Virtual CRAT table created for CPU
>> [ 5.162104] amdgpu: Topology: Add CPU node
>> [ 5.162197] amdgpu 0000:30:00.0: enabling device (0000 -> 0003)
>> [ 5.162232] amdgpu 0000:30:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
>> [ 5.169105] amdgpu 0000:30:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
>> [ 5.174413] amdgpu 0000:30:00.0: amdgpu: Unable to locate a BIOS ROM
>> [ 5.174415] amdgpu 0000:30:00.0: amdgpu: Fatal error during GPU init
>> [ 5.174416] amdgpu 0000:30:00.0: amdgpu: amdgpu: finishing device.
>> [ 5.174425] Modules linked in: amdgpu(+) uas usb_storage fjes(-) raid1 drm_ttm_helper ttm iommu_v2 gpu_sched drm_kms_helper crct10dif_pclmul crc32_pclmul igb crc32c_intel cec ghash_clmulni_intel drm sp5100_tco dca ccp i2c_algo_bit wmi video sunrpc tcp_bbr nct6775 hwmon_vid k10temp
>> [ 5.174463] amdgpu_device_fini_hw+0x33/0x2c5 [amdgpu]
>> [ 5.174594] amdgpu_driver_load_kms.cold+0x72/0x94 [amdgpu]
>> [ 5.174706] amdgpu_pci_probe+0x110/0x1a0 [amdgpu]
>> [ 5.174907] amdgpu: probe of 0000:30:00.0 failed with error -22
>>
>>
>> are specific configs from
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Flatest%2Fgpu%2Famdgpu.html&data=04%7C01%7Clijo.lazar%40amd.com%7C508775dd6cc24018696208d997c1f667%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637707681607159780%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=LivHv37A0%2FxKYxSqt1HUzNiIqFznX2N3OEb5gcR4k8U%3D&reserved=0
>>
>> required to avoid/workaround the init error? or known bug?
>
> The driver is not able to find the vbios image which is required for
> the driver to properly enumerate the hardware. I would guess it's a
> platform issue. Is there a newer sbios image available for your
> platform? You might try that or check if there are any options in the
> sbios regarding the behavior of the integrated graphics when an
> external GPU is present. I suspect the one of the following is the
> problem:
> 1. The sbios should disable the integrated graphics when a dGPU is
> present, but due to a bug in the sbios or a particular sbios settings
> it has failed to.
> 2. The sbios should be providing a vbios image for the integrated
> graphics, but due to a bug in the sbios or a particular sbios settings
> it has failed to.
> 3. The platform uses some alternative method to provide access to the
> vbios image for the integrated graphics that Linux does not yet
> handle.
>
To add to the list - check if ACPI support is broken or skipped.
Thanks,
Lijo
> I would start with an sbios update is possible.
>
> Alex
>
More information about the amd-gfx
mailing list