[Bug 201957] amdgpu: ring gfx timeout

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Wed Dec 22 20:33:07 UTC 2021


https://bugzilla.kernel.org/show_bug.cgi?id=201957

roman (coolx67 at gmx.at) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |coolx67 at gmx.at

--- Comment #52 from roman (coolx67 at gmx.at) ---
I can confirm that 
amdgpu.dpm=0 
removes the issue 
on an AMD Radeon PRO FIJI (Dual Fury) kernel: 5.15.10|FW:
20211027.1d00989-1|mesa: 21.3.2-1

Works perfectly fine in Gnome as long as there is no application accessing the
2nd GPU. 

When opening Radeon-profile as long as card0 is selected, there is no issue but
as soon as I select card1 I get instantly 
Dec 22 21:15:46 Workstation kernel: amdgpu: 
                                     failed to send message 171 ret is 0 
Dec 22 21:15:49 Workstation kernel: amdgpu: 
                                     last message was failed ret is 0

The application Radeon-profile freezes but desktop is still responsive. 



When opening CS:GO with mangohud and configuring either

pci_dev = 0000:3d:00.0 # primary card works fine
or 
pci_dev = 0000:3e:00.0 # secondary card, errors from above occur and CS:GO
loads super slow and after menu is visible it is stuck 

When CSM is disabled in BIOS I have 2 GPUs 

Dec 22 20:45:50 Workstation kernel: [drm] amdgpu kernel modesetting enabled.
Dec 22 20:45:50 Workstation kernel: amdgpu: CRAT table not found
Dec 22 20:45:50 Workstation kernel: amdgpu: Virtual CRAT table created for CPU
Dec 22 20:45:50 Workstation kernel: amdgpu: Topology: Add CPU node
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: vgaarb: deactivate vga
console
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: enabling device (0106
-> 0107)
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: Trusted Memory
Zone (TMZ) feature not supported
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: Fetched VBIOS
from ROM BAR
Dec 22 20:45:50 Workstation kernel: amdgpu: ATOM BIOS: 113-C88801MS-102
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: VRAM: 4096M
0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: GART: 1024M
0x000000FF00000000 - 0x000000FF3FFFFFFF
Dec 22 20:45:50 Workstation kernel: [drm] amdgpu: 4096M of VRAM memory ready
Dec 22 20:45:50 Workstation kernel: [drm] amdgpu: 4096M of GTT memory ready.
Dec 22 20:45:50 Workstation kernel: amdgpu: hwmgr_sw_init smu backed is
fiji_smu
Dec 22 20:45:50 Workstation kernel: snd_hda_intel 0000:3d:00.1: bound
0000:3d:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
Dec 22 20:45:50 Workstation kernel: [drm:retrieve_link_cap [amdgpu]] *ERROR*
retrieve_link_cap: Read receiver caps dpcd data failed.
Dec 22 20:45:50 Workstation kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on
gart
Dec 22 20:45:50 Workstation kernel: amdgpu: Virtual CRAT table created for GPU
Dec 22 20:45:50 Workstation kernel: amdgpu: Topology: Add dGPU node
[0x7300:0x1002]
Dec 22 20:45:50 Workstation kernel: kfd kfd: amdgpu: added device 1002:7300
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: SE 4, SH per
SE 1, CU per SH 16, active_cu_number 64
Dec 22 20:45:50 Workstation kernel: fbcon: amdgpu (fb0) is primary device
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3d:00.0: [drm] fb0: amdgpu
frame buffer device
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: Using BACO for
runtime pm
Dec 22 20:45:51 Workstation kernel: [drm] Initialized amdgpu 3.42.0 20150101
for 0000:3d:00.0 on minor 0
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3e:00.0: enabling device (0106
-> 0107)
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: Trusted Memory
Zone (TMZ) feature not supported
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: Fetched VBIOS
from ROM BAR
Dec 22 20:45:51 Workstation kernel: amdgpu: ATOM BIOS: 113-C88801SL-102
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: VRAM: 4096M
0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: GART: 1024M
0x000000FF00000000 - 0x000000FF3FFFFFFF
Dec 22 20:45:51 Workstation kernel: [drm] amdgpu: 4096M of VRAM memory ready
Dec 22 20:45:51 Workstation kernel: [drm] amdgpu: 4096M of GTT memory ready.
Dec 22 20:45:51 Workstation kernel: amdgpu: hwmgr_sw_init smu backed is
fiji_smu
Dec 22 20:45:51 Workstation kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on
gart
Dec 22 20:45:51 Workstation kernel: amdgpu: Virtual CRAT table created for GPU
Dec 22 20:45:51 Workstation kernel: amdgpu: Topology: Add dGPU node
[0x7300:0x1002]
Dec 22 20:45:51 Workstation kernel: kfd kfd: amdgpu: added device 1002:7300
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: SE 4, SH per
SE 1, CU per SH 16, active_cu_number 64
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: Using BACO for
runtime pm
Dec 22 20:45:51 Workstation kernel: [drm] Initialized amdgpu 3.42.0 20150101
for 0000:3e:00.0 on minor 1
Dec 22 20:45:53 Workstation gnome-shell[1988]: Added device '/dev/dri/card0'
(amdgpu) using atomic mode setting.
Dec 22 20:45:53 Workstation gnome-shell[1988]: Added device '/dev/dri/card1'
(amdgpu) using atomic mode setting.
Dec 22 20:45:55 Workstation gnome-shell[1988]: Disabling DMA buffer screen
sharing for driver 'amdgpu'.
Dec 22 20:46:03 Workstation gnome-shell[2527]: Added device '/dev/dri/card0'
(amdgpu) using atomic mode setting.
Dec 22 20:46:04 Workstation gnome-shell[2527]: Added device '/dev/dri/card1'
(amdgpu) using atomic mode setting.
Dec 22 20:46:05 Workstation gnome-shell[2527]: Disabling DMA buffer screen
sharing for driver 'amdgpu'.


With enabled CSM only the primary GPU is available
Dec 17 18:17:51 Workstation kernel: [drm] amdgpu kernel modesetting enabled.
Dec 17 18:17:51 Workstation kernel: amdgpu: CRAT table not found
Dec 17 18:17:51 Workstation kernel: amdgpu: Virtual CRAT table created for CPU
Dec 17 18:17:51 Workstation kernel: amdgpu: Topology: Add CPU node
Dec 17 18:17:51 Workstation kernel: fb0: switching to amdgpu from EFI VGA
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: vgaarb: deactivate vga
console
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: Trusted Memory
Zone (TMZ) feature not supported
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: No more image in the
PCI ROM
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: Fetched VBIOS
from ROM BAR
Dec 17 18:17:51 Workstation kernel: amdgpu: ATOM BIOS: 113-C88801MS-102
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: BAR 2: releasing [mem
0xb0000000-0xb01fffff 64bit pref]
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: BAR 0: releasing [mem
0xa0000000-0xafffffff 64bit pref]
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: BAR 0: assigned [mem
0x388000000000-0x3880ffffffff 64bit pref]
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: BAR 2: assigned [mem
0x388100000000-0x3881001fffff 64bit pref]
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: VRAM: 4096M
0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: GART: 1024M
0x000000FF00000000 - 0x000000FF3FFFFFFF
Dec 17 18:17:51 Workstation kernel: [drm] amdgpu: 4096M of VRAM memory ready
Dec 17 18:17:51 Workstation kernel: [drm] amdgpu: 4096M of GTT memory ready.
Dec 17 18:17:51 Workstation kernel: amdgpu: hwmgr_sw_init smu backed is
fiji_smu
Dec 17 18:17:51 Workstation kernel: snd_hda_intel 0000:3d:00.1: bound
0000:3d:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
Dec 17 18:17:51 Workstation kernel: [drm:retrieve_link_cap [amdgpu]] *ERROR*
retrieve_link_cap: Read receiver caps dpcd data failed.
Dec 17 18:17:51 Workstation kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on
gart
Dec 17 18:17:51 Workstation kernel: amdgpu: Virtual CRAT table created for GPU
Dec 17 18:17:51 Workstation kernel: amdgpu: Topology: Add dGPU node
[0x7300:0x1002]
Dec 17 18:17:51 Workstation kernel: kfd kfd: amdgpu: added device 1002:7300
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: SE 4, SH per
SE 1, CU per SH 16, active_cu_number 64
Dec 17 18:17:51 Workstation kernel: fbcon: amdgpu (fb0) is primary device
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: [drm] fb0: amdgpu
frame buffer device
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: Using BACO for
runtime pm
Dec 17 18:17:51 Workstation kernel: [drm] Initialized amdgpu 3.42.0 20150101
for 0000:3d:00.0 on minor 0
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3e:00.0: enabling device (0100
-> 0103)
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: Trusted Memory
Zone (TMZ) feature not supported
Dec 17 18:17:52 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: Fetched VBIOS
from ROM BAR
Dec 17 18:17:52 Workstation kernel: amdgpu: ATOM BIOS: 113-C88801SL-102
Dec 17 18:17:52 Workstation kernel: amdgpu 0000:3e:00.0: BAR 2: releasing [???
0x00000000 flags 0x0]
Dec 17 18:17:52 Workstation kernel: amdgpu 0000:3e:00.0: BAR 0: releasing [???
0x00000000 flags 0x0]
Dec 17 18:17:52 Workstation kernel: [drm:amdgpu_device_resize_fb_bar [amdgpu]]
*ERROR* Problem resizing BAR0 (-16).
Dec 17 18:17:52 Workstation kernel: [drm:amdgpu_device_init.cold [amdgpu]]
*ERROR* sw_init of IP block <gmc_v8_0> failed -19
Dec 17 18:17:52 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu:
amdgpu_device_ip_init failed
Dec 17 18:17:52 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: Fatal error
during GPU init
Dec 17 18:17:52 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: amdgpu:
finishing device.
Dec 17 18:18:00 Workstation gnome-shell[1921]: Added device '/dev/dri/card0'
(amdgpu) using atomic mode setting.
Dec 17 18:18:02 Workstation gnome-shell[1921]: Disabling DMA buffer screen
sharing for driver 'amdgpu'.
Dec 17 18:18:13 Workstation gnome-shell[2410]: Added device '/dev/dri/card0'
(amdgpu) using atomic mode setting.
Dec 17 18:18:14 Workstation gnome-shell[2410]: Disabling DMA buffer screen
sharing for driver 'amdgpu'.

Hopefully @Alex  can do/forward this since this is a P1 blocking issue and open
for 3 years.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list