amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13

PGNet Dev pgnet.dev at gmail.com
Mon Oct 25 13:48:28 UTC 2021


( cc'ing this here, OP -> dri-devel@ )

i've a dual gpu system

	inxi -GS
		System:    Host: ws05 Kernel: 5.14.13-200.fc34.x86_64 x86_64 bits: 64 Console: tty pts/0
		           Distro: Fedora release 34 (Thirty Four)
(1)		Graphics:  Device-1: NVIDIA GK208B [GeForce GT 710] driver: nvidia v: 470.74
(2)		           Device-2: Advanced Micro Devices [AMD/ATI] Cezanne driver: N/A
		           Display: server: X.org 1.20.11 driver: loaded: nvidia unloaded: fbdev,modesetting,vesa
		           Message: Advanced graphics data unavailable for root.

running on

	cpu:    Ryzen 5 5600G
	mobo:   ASRockRack X470D4U
	bios:   vP4.20, 04/14/2021
	kernel: 5.14.13-200.fc34.x86_64 x86_64

where,

	the nvidia is a PCIe card
	the amdgpu is the Ryzen-integrated gpu

the nvidia PCI is currently my primary
it's screen-attached, and boots/functions correctly

	lsmod | grep nvidia
		nvidia_drm             69632  0
		nvidia_modeset       1200128  1 nvidia_drm
		nvidia              35332096  1 nvidia_modeset
		drm_kms_helper        303104  2 amdgpu,nvidia_drm
		drm                   630784  8 gpu_sched,drm_kms_helper,nvidia,amdgpu,drm_ttm_helper,nvidia_drm,ttm

	dmesg | grep -i nvidia
		[    5.755494] nvidia: loading out-of-tree module taints kernel.
		[    5.755503] nvidia: module license 'NVIDIA' taints kernel.
		[    5.759769] nvidia: module verification failed: signature and/or required key missing - tainting kernel
		[    5.774894] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
		[    5.775299] nvidia 0000:10:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
		[    5.975449] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  470.74  Mon Sep 13 23:09:15 UTC 2021
		[    6.013181] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  470.74  Mon Sep 13 22:59:50 UTC 2021
		[    6.016444] [drm] [nvidia-drm] [GPU ID 0x00001000] Loading driver
		[    6.227295] caller _nv000723rm+0x1ad/0x200 [nvidia] mapping multiple BARs
		[    6.954906] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:10:00.0 on minor 0
		[   16.820758] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input13
		[   16.820776] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input14
		[   16.820808] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input15
		[   16.820826] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input16
		[   16.820841] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input17

the amdgpu is not (currently/yet) in use; no attached screen

in BIOS, currently,

	'PCI Express' (nvidia gpu) is selected as primary
	'HybridGraphics' is enabled
	'OnBoard VGA' is enabled


on boot, mods are loaded

	lsmod | grep gpu
		amdgpu               7802880  0
		drm_ttm_helper         16384  1 amdgpu
		ttm                    81920  2 amdgpu,drm_ttm_helper
		iommu_v2               24576  1 amdgpu
		gpu_sched              45056  1 amdgpu
		drm_kms_helper        303104  2 amdgpu,nvidia_drm
		drm                   630784  8 gpu_sched,drm_kms_helper,nvidia,amdgpu,drm_ttm_helper,nvidia_drm,ttm
		i2c_algo_bit           16384  2 igb,amdgpu

but i see a 'fatal error' and 'failed' probe,

	dmesg | grep -i amdgpu
		[    5.161923] [drm] amdgpu kernel modesetting enabled.
		[    5.162097] amdgpu: Virtual CRAT table created for CPU
		[    5.162104] amdgpu: Topology: Add CPU node
		[    5.162197] amdgpu 0000:30:00.0: enabling device (0000 -> 0003)
		[    5.162232] amdgpu 0000:30:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
		[    5.169105] amdgpu 0000:30:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
		[    5.174413] amdgpu 0000:30:00.0: amdgpu: Unable to locate a BIOS ROM
		[    5.174415] amdgpu 0000:30:00.0: amdgpu: Fatal error during GPU init
		[    5.174416] amdgpu 0000:30:00.0: amdgpu: amdgpu: finishing device.
		[    5.174425] Modules linked in: amdgpu(+) uas usb_storage fjes(-) raid1 drm_ttm_helper ttm iommu_v2 gpu_sched drm_kms_helper crct10dif_pclmul crc32_pclmul igb crc32c_intel cec ghash_clmulni_intel drm sp5100_tco dca ccp i2c_algo_bit wmi video sunrpc tcp_bbr nct6775 hwmon_vid k10temp
		[    5.174463]  amdgpu_device_fini_hw+0x33/0x2c5 [amdgpu]
		[    5.174594]  amdgpu_driver_load_kms.cold+0x72/0x94 [amdgpu]
		[    5.174706]  amdgpu_pci_probe+0x110/0x1a0 [amdgpu]
		[    5.174907] amdgpu: probe of 0000:30:00.0 failed with error -22


are specific configs from

	https://www.kernel.org/doc/html/latest/gpu/amdgpu.html

required to avoid/workaround the init error?  or known bug?


More information about the amd-gfx mailing list