amdgpu "Fatal error during GPU init"; Ryzen 5600G integrated GPU + kernel 5.14.13

Lazar, Lijo lijo.lazar at amd.com
Mon Oct 25 15:15:44 UTC 2021



On 10/25/2021 7:45 PM, Alex Deucher wrote:
> On Mon, Oct 25, 2021 at 9:48 AM PGNet Dev <pgnet.dev at gmail.com> wrote:
>>
>> ( cc'ing this here, OP -> dri-devel@ )
>>
>> i've a dual gpu system
>>
>>          inxi -GS
>>                  System:    Host: ws05 Kernel: 5.14.13-200.fc34.x86_64 x86_64 bits: 64 Console: tty pts/0
>>                             Distro: Fedora release 34 (Thirty Four)
>> (1)             Graphics:  Device-1: NVIDIA GK208B [GeForce GT 710] driver: nvidia v: 470.74
>> (2)                        Device-2: Advanced Micro Devices [AMD/ATI] Cezanne driver: N/A
>>                             Display: server: X.org 1.20.11 driver: loaded: nvidia unloaded: fbdev,modesetting,vesa
>>                             Message: Advanced graphics data unavailable for root.
>>
>> running on
>>
>>          cpu:    Ryzen 5 5600G
>>          mobo:   ASRockRack X470D4U
>>          bios:   vP4.20, 04/14/2021
>>          kernel: 5.14.13-200.fc34.x86_64 x86_64
>>
>> where,
>>
>>          the nvidia is a PCIe card
>>          the amdgpu is the Ryzen-integrated gpu
>>
>> the nvidia PCI is currently my primary
>> it's screen-attached, and boots/functions correctly
>>
>>          lsmod | grep nvidia
>>                  nvidia_drm             69632  0
>>                  nvidia_modeset       1200128  1 nvidia_drm
>>                  nvidia              35332096  1 nvidia_modeset
>>                  drm_kms_helper        303104  2 amdgpu,nvidia_drm
>>                  drm                   630784  8 gpu_sched,drm_kms_helper,nvidia,amdgpu,drm_ttm_helper,nvidia_drm,ttm
>>
>>          dmesg | grep -i nvidia
>>                  [    5.755494] nvidia: loading out-of-tree module taints kernel.
>>                  [    5.755503] nvidia: module license 'NVIDIA' taints kernel.
>>                  [    5.759769] nvidia: module verification failed: signature and/or required key missing - tainting kernel
>>                  [    5.774894] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
>>                  [    5.775299] nvidia 0000:10:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
>>                  [    5.975449] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  470.74  Mon Sep 13 23:09:15 UTC 2021
>>                  [    6.013181] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  470.74  Mon Sep 13 22:59:50 UTC 2021
>>                  [    6.016444] [drm] [nvidia-drm] [GPU ID 0x00001000] Loading driver
>>                  [    6.227295] caller _nv000723rm+0x1ad/0x200 [nvidia] mapping multiple BARs
>>                  [    6.954906] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:10:00.0 on minor 0
>>                  [   16.820758] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input13
>>                  [   16.820776] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input14
>>                  [   16.820808] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input15
>>                  [   16.820826] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input16
>>                  [   16.820841] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:01.1/0000:10:00.1/sound/card0/input17
>>
>> the amdgpu is not (currently/yet) in use; no attached screen
>>
>> in BIOS, currently,
>>
>>          'PCI Express' (nvidia gpu) is selected as primary
>>          'HybridGraphics' is enabled
>>          'OnBoard VGA' is enabled
>>
>>
>> on boot, mods are loaded
>>
>>          lsmod | grep gpu
>>                  amdgpu               7802880  0
>>                  drm_ttm_helper         16384  1 amdgpu
>>                  ttm                    81920  2 amdgpu,drm_ttm_helper
>>                  iommu_v2               24576  1 amdgpu
>>                  gpu_sched              45056  1 amdgpu
>>                  drm_kms_helper        303104  2 amdgpu,nvidia_drm
>>                  drm                   630784  8 gpu_sched,drm_kms_helper,nvidia,amdgpu,drm_ttm_helper,nvidia_drm,ttm
>>                  i2c_algo_bit           16384  2 igb,amdgpu
>>
>> but i see a 'fatal error' and 'failed' probe,
>>
>>          dmesg | grep -i amdgpu
>>                  [    5.161923] [drm] amdgpu kernel modesetting enabled.
>>                  [    5.162097] amdgpu: Virtual CRAT table created for CPU
>>                  [    5.162104] amdgpu: Topology: Add CPU node
>>                  [    5.162197] amdgpu 0000:30:00.0: enabling device (0000 -> 0003)
>>                  [    5.162232] amdgpu 0000:30:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
>>                  [    5.169105] amdgpu 0000:30:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
>>                  [    5.174413] amdgpu 0000:30:00.0: amdgpu: Unable to locate a BIOS ROM
>>                  [    5.174415] amdgpu 0000:30:00.0: amdgpu: Fatal error during GPU init
>>                  [    5.174416] amdgpu 0000:30:00.0: amdgpu: amdgpu: finishing device.
>>                  [    5.174425] Modules linked in: amdgpu(+) uas usb_storage fjes(-) raid1 drm_ttm_helper ttm iommu_v2 gpu_sched drm_kms_helper crct10dif_pclmul crc32_pclmul igb crc32c_intel cec ghash_clmulni_intel drm sp5100_tco dca ccp i2c_algo_bit wmi video sunrpc tcp_bbr nct6775 hwmon_vid k10temp
>>                  [    5.174463]  amdgpu_device_fini_hw+0x33/0x2c5 [amdgpu]
>>                  [    5.174594]  amdgpu_driver_load_kms.cold+0x72/0x94 [amdgpu]
>>                  [    5.174706]  amdgpu_pci_probe+0x110/0x1a0 [amdgpu]
>>                  [    5.174907] amdgpu: probe of 0000:30:00.0 failed with error -22
>>
>>
>> are specific configs from
>>
>>          https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Flatest%2Fgpu%2Famdgpu.html&data=04%7C01%7Clijo.lazar%40amd.com%7C508775dd6cc24018696208d997c1f667%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637707681607159780%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=LivHv37A0%2FxKYxSqt1HUzNiIqFznX2N3OEb5gcR4k8U%3D&reserved=0
>>
>> required to avoid/workaround the init error?  or known bug?
> 
> The driver is not able to find the vbios image which is required for
> the driver to properly enumerate the hardware.  I would guess it's a
> platform issue.  Is there a newer sbios image available for your
> platform?  You might try that or check if there are any options in the
> sbios regarding the behavior of the integrated graphics when an
> external GPU is present.  I suspect the one of the following is the
> problem:
> 1. The sbios should disable the integrated graphics when a dGPU is
> present, but due to a bug in the sbios or a particular sbios settings
> it has failed to.
> 2. The sbios should be providing a vbios image for the integrated
> graphics, but due to a bug in the sbios or a particular sbios settings
> it has failed to.
> 3. The platform uses some alternative method to provide access to the
> vbios image for the integrated graphics that Linux does not yet
> handle.
> 
To add to the list - check if ACPI support is broken or skipped.

Thanks,
Lijo

> I would start with an sbios update is possible.
> 
> Alex
> 


More information about the amd-gfx mailing list