Amdgpu kernel oops and freezing graphics
Harvey
harv at gmx.de
Mon Jul 20 08:21:54 UTC 2020
Hello,
this is my first post to this list so please be patient with me ;)
The facts:
it is now one week that I own a new laptop, a MSI Bravo 17 A4DDR/MS-17FK
with Ryzen 7 4800U and hybrid graphics on a Radeon RX 5500M. I installed
my beloved Archlinux but I can't start any graphics withpout kernel oops
on it beside the normal console, even calling 'lspci' on the console is
provoking errors.
I am using linux kernel 5.7.9 and linux-firmware 20200619.e96c121
(FWIW: I even tried with a self-cmpiled kernel 5.8-rc5 and
linux-firmware directly from the git repository - no changes)
The following is only part of the information I can provide but I didn't
want to make this mail bigger than it already is.
the lspci -k output is:
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Root Complex
Subsystem: Advanced Micro Devices, Inc. [AMD] Renoir Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir IOMMU
Subsystem: Advanced Micro Devices, Inc. [AMD] Renoir IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe
Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP
Bridge
Kernel driver in use: pcieport
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe
Dummy Host Bridge
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP
Bridge
Kernel driver in use: pcieport
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP
Bridge
Kernel driver in use: pcieport
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe
Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal
PCIe GPP Bridge to Bus
Kernel driver in use: pcieport
00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal
PCIe GPP Bridge to Bus
Kernel driver in use: pcieport
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
(rev 51)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ac
Kernel driver in use: piix4_smbus
Kernel modules: i2c_piix4, sp5100_tco
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
(rev 51)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ac
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device
24: Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device
24: Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device
24: Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device
24: Function 3
Kernel driver in use: k10temp
Kernel modules: k10temp
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device
24: Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device
24: Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device
24: Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir Device
24: Function 7
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL
Upstream Port of PCI Express Switch (rev c1)
Kernel driver in use: pcieport
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL
Downstream Port of PCI Express Switch
Kernel driver in use: pcieport
03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi
14 [Radeon RX 5500/5500M / Pro 5500M] (rev c1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ac
Kernel driver in use: amdgpu
Kernel modules: amdgpu
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10
HDMI Audio
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ac
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
04:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
Subsystem: Intel Corporation Device 0084
Kernel driver in use: iwlwifi
Kernel modules: iwlwifi
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ac
Kernel driver in use: r8169
Kernel modules: r8169
06:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Renoir (rev c6)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ac
Kernel driver in use: amdgpu
Kernel modules: amdgpu
06:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family
17h (Models 10h-1fh) Platform Security Processor
Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models
10h-1fh) Platform Security Processor
Kernel driver in use: ccp
Kernel modules: ccp
06:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ac
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
06:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir USB 3.1
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ac
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
06:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD]
Raven/Raven2/FireFlight/Renoir Audio Processor (rev 01)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ac
Kernel modules: snd_pci_acp3x
06:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h
(Models 10h-1fh) HD Audio Controller
DeviceName: HD Audio Controller
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ac
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
07:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA
Controller [AHCI mode] (rev 81)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ac
Kernel driver in use: ahci
07:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA
Controller [AHCI mode] (rev 81)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ac
Kernel driver in use: ahci
The output of dmesg -l err,warn is:
[ 0.335758] TSC synchronization [CPU#0 -> CPU#8]:
[ 0.335759] Measured 4524 cycles TSC warp between CPUs, turning off
TSC clock.
[ 0.335763] #9 #10 #11 #12 #13 #14 #15
[ 0.388723] Expanded resource Reserved due to conflict with PCI Bus
0000:00
[ 0.495565] pci 0000:00:00.2: can't derive routing for PCI INT A
[ 0.495566] pci 0000:00:00.2: PCI INT A: not connected
[ 0.497518] PPR X2APIC NX GT IA GA PC GA_vAPIC
[ 0.507325] efifb: Ignoring BGRT: unexpected or invalid BMP data
[ 0.535452] Unstable clock detected, switching default tracing clock
to "global"
If you want to keep using the local clock, then add:
"trace_clock=local"
on the kernel command line
[ 1.822245] soc_button_array ACPI0011:00: Unknown button index 0
upage 01 usage c6, ignoring
[ 1.843906] i2c_hid i2c-PNP0C50:0b: supply vdd not found, using dummy
regulator
[ 1.843924] i2c_hid i2c-PNP0C50:0b: supply vddl not found, using
dummy regulator
[ 1.865149] snd_pci_acp3x 0000:06:00.5: Invalid ACP audio mode : 0
[ 1.914676] sp5100-tco sp5100-tco: Watchdog hardware is disabled
[ 1.933349] platform regulatory.0: Direct firmware load for
regulatory.db failed with error -2
[ 2.057649] iwlwifi 0000:04:00.0: api flags index 2 larger than
supported by driver
[ 2.057914] iwlwifi 0000:04:00.0: Direct firmware load for
iwl-debug-yoyo.bin failed with error -2
[ 2.256189] uvcvideo 1-4:1.0: Entity type for entity Extension 4 was
not initialized!
[ 2.256194] uvcvideo 1-4:1.0: Entity type for entity Extension 3 was
not initialized!
[ 2.256196] uvcvideo 1-4:1.0: Entity type for entity Processing 2 was
not initialized!
[ 2.256198] uvcvideo 1-4:1.0: Entity type for entity Camera 1 was not
initialized!
[ 2.373041] ATPX version 1, functions 0x00000001
[ 2.373173] ATPX Hybrid Graphics
[ 2.393743] [drm:amdgpu_get_bios [amdgpu]] *ERROR* ACPI VFCT table
present but broken (too short #2)
[ 2.615299] thermal thermal_zone1: failed to read out thermal zone (-61)
[ 3.268390] SMU driver if version not matched
[ 3.989144] sos fw version = 0x110d32.
[ 4.160713] SMU driver if version not matched
[ 4.294363] [drm:mod_hdcp_add_display_to_topology [amdgpu]] *ERROR*
Failed to add display topology, DTM TA is not initialized.
[ 4.294366] [drm] [Link 0] WARNING MOD_HDCP_STATUS_FAILURE IN STATE
HDCP_UNINITIALIZED STAY COUNT 0
[ 5.705210] kauditd_printk_skb: 21 callbacks suppressed
[ 13.102593] kauditd_printk_skb: 10 callbacks suppressed
[ 26.297372] kauditd_printk_skb: 13 callbacks suppressed
[ 32.442652] kauditd_printk_skb: 6 callbacks suppressed
[ 415.086058] failed send message: RunBtc (58) param: 0x00000000
response 0xffffffc2
[ 415.086059] RunBtc failed!
[ 415.086101] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR*
resume of IP block <smu> failed -62
[ 415.086138] [drm:amdgpu_device_resume [amdgpu]] *ERROR*
amdgpu_device_ip_resume failed (-62).
[ 415.204513] snd_hda_intel 0000:03:00.1: CORB reset timeout#2, CORBRP
= 65535
[ 421.356866] ------------[ cut here ]------------
[ 421.357046] WARNING: CPU: 9 PID: 680 at
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:1517
dm_suspend+0x4e/0x60 [amdgpu]
[ 421.357047] Modules linked in: iwlmvm joydev snd_hda_codec_realtek
mousedev amdgpu uvcvideo mac80211 snd_hda_codec_generic
snd_hda_codec_hdmi btusb videobuf2_vmalloc ledtrig_audio
videobuf2_memops btrtl edac_mce_amd btbcm videobuf2_v4l2 btintel kvm_amd
hid_multitouch snd_hda_intel msi_wmi videobuf2_common libarc4 bluetooth
snd_intel_dspcfg hid_generic sparse_keymap kvm snd_hda_codec videodev
nls_iso8859_1 nls_cp437 gpu_sched i2c_algo_bit vfat ecdh_generic fat ttm
irqbypass iwlwifi mc ecc snd_hda_core drm_kms_helper snd_hwdep snd_pcm
crct10dif_pclmul crc32_pclmul r8169 cec ghash_clmulni_intel snd_timer
cfg80211 aesni_intel rc_core syscopyarea sysfillrect snd crypto_simd
sp5100_tco realtek sysimgblt cryptd glue_helper fb_sys_fops psmouse ccp
pcspkr input_leds i2c_piix4 k10temp snd_pci_acp3x soundcore libphy
rfkill wmi battery ac i2c_hid tpm_crb hid tpm_tis tpm_tis_core tpm evdev
mac_hid pinctrl_amd rng_core acpi_cpufreq soc_button_array drm
crypto_user agpgart ip_tables x_tables serio_raw
[ 421.357115] atkbd libps2 xhci_pci xhci_hcd i8042 serio ext4
crc32c_generic crc32c_intel crc16 mbcache jbd2
[ 421.357128] CPU: 9 PID: 680 Comm: kworker/9:2 Not tainted
5.7.9-arch1-1 #1
[ 421.357130] Hardware name: Micro-Star International Co., Ltd. Bravo
17 A4DDR/MS-17FK, BIOS E17FKAMS.113 05/18/2020
[ 421.357138] Workqueue: pm pm_runtime_work
[ 421.357296] RIP: 0010:dm_suspend+0x4e/0x60 [amdgpu]
[ 421.357299] Code: 00 48 89 83 20 47 01 00 e8 ef fd ff ff 48 89 df e8
27 a1 00 00 48 8b bb a0 2c 01 00 be 08 00 00 00 e8 86 c2 0f 00 31 c0 5b
c3 <0f> 0b eb c1 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00
[ 421.357300] RSP: 0018:ffffb34c00b4fcb0 EFLAGS: 00010286
[ 421.357302] RAX: ffffffffc10af180 RBX: ffffa34103ca0000 RCX:
0000000000000000
[ 421.357304] RDX: 000000000000000a RSI: 0000000000000ff8 RDI:
ffffa34103ca0000
[ 421.357305] RBP: 0000000000000005 R08: 0000000000000000 R09:
0000000000000000
[ 421.357306] R10: 0000000000000002 R11: 00000000000000f0 R12:
ffffa34103ca0000
[ 421.357307] R13: ffffa3411c4030b0 R14: ffffa34103ca0000 R15:
ffffa3411f86c9b0
[ 421.357309] FS: 0000000000000000(0000) GS:ffffa3411f840000(0000)
knlGS:0000000000000000
[ 421.357311] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 421.357312] CR2: 00005562a24d6010 CR3: 00000003d240a000 CR4:
0000000000340ee0
[ 421.357313] Call Trace:
[ 421.357433] amdgpu_device_ip_suspend_phase1+0x83/0xe0 [amdgpu]
[ 421.357546] amdgpu_device_suspend+0x9b/0x2c0 [amdgpu]
[ 421.357553] ? update_blocked_averages+0x539/0x620
[ 421.357664] amdgpu_pmops_runtime_suspend+0x9e/0x140 [amdgpu]
[ 421.357672] pci_pm_runtime_suspend+0x5e/0x170
[ 421.357678] ? __switch_to_asm+0x40/0x70
[ 421.357683] ? vga_switcheroo_runtime_resume+0x60/0x60
[ 421.357686] vga_switcheroo_runtime_suspend+0x22/0xb0
[ 421.357689] ? vga_switcheroo_runtime_resume+0x60/0x60
[ 421.357692] ? vga_switcheroo_runtime_resume+0x60/0x60
[ 421.357694] __rpm_callback+0x7b/0x130
[ 421.357697] ? vga_switcheroo_runtime_resume+0x60/0x60
[ 421.357700] rpm_callback+0x1f/0x70
[ 421.357703] ? vga_switcheroo_runtime_resume+0x60/0x60
[ 421.357705] rpm_suspend+0x174/0x6d0
[ 421.357709] pm_runtime_work+0x94/0xa0
[ 421.357714] process_one_work+0x1da/0x3d0
[ 421.357717] worker_thread+0x4d/0x3e0
[ 421.357720] ? rescuer_thread+0x3f0/0x3f0
[ 421.357722] kthread+0x13e/0x160
[ 421.357726] ? __kthread_bind_mask+0x60/0x60
[ 421.357728] ret_from_fork+0x22/0x40
[ 421.357733] ---[ end trace 7a1af789893080c1 ]---
[ 421.357928] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 421.357931] CPU: 9 PID: 680 Comm: kworker/9:2 Tainted: G W
5.7.9-arch1-1 #1
[ 421.357932] Hardware name: Micro-Star International Co., Ltd. Bravo
17 A4DDR/MS-17FK, BIOS E17FKAMS.113 05/18/2020
[ 421.357936] Workqueue: pm pm_runtime_work
[ 421.358078] RIP: 0010:kernel_queue_uninit+0xd/0xf0 [amdgpu]
[ 421.358081] Code: ee 48 89 c7 e8 a4 f9 ff ff 84 c0 0f 84 3a 40 1a 00
4c 89 e0 5d 41 5c 41 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 8b 47 10 48 89
fd <8b> 50 28 83 fa 02 74 78 83 fa 03 0f 84 b1 00 00 00 48 8b 7f 08 4c
[ 421.358083] RSP: 0018:ffffb34c00b4fca8 EFLAGS: 00010246
[ 421.358085] RAX: 0000000000000010 RBX: ffffa341022cdc00 RCX:
0000000080800076
[ 421.358087] RDX: 0000000080800077 RSI: 0000000000000000 RDI:
ffffa3411c874a80
[ 421.358089] RBP: ffffa3411c874a80 R08: 0000000000000001 R09:
0000000000000001
[ 421.358091] R10: ffffa34118da0660 R11: dead000000000100 R12:
ffffa341022cdd28
[ 421.358092] R13: ffffa3411c4030b0 R14: ffffa34103ca0000 R15:
ffffa3411f86c9b0
[ 421.358095] FS: 0000000000000000(0000) GS:ffffa3411f840000(0000)
knlGS:0000000000000000
[ 421.358097] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 421.358098] CR2: 0000000000000038 CR3: 00000003d240a000 CR4:
0000000000340ee0
[ 421.358100] Call Trace:
[ 421.358240] stop_cpsch+0x97/0xc0 [amdgpu]
[ 421.358376] kgd2kfd_suspend.part.0+0x2f/0x40 [amdgpu]
[ 421.358489] amdgpu_device_suspend+0xa7/0x2c0 [amdgpu]
[ 421.358493] ? update_blocked_averages+0x539/0x620
[ 421.358604] amdgpu_pmops_runtime_suspend+0x9e/0x140 [amdgpu]
[ 421.358609] pci_pm_runtime_suspend+0x5e/0x170
[ 421.358612] ? __switch_to_asm+0x40/0x70
[ 421.358616] ? vga_switcheroo_runtime_resume+0x60/0x60
[ 421.358619] vga_switcheroo_runtime_suspend+0x22/0xb0
[ 421.358622] ? vga_switcheroo_runtime_resume+0x60/0x60
[ 421.358625] ? vga_switcheroo_runtime_resume+0x60/0x60
[ 421.358628] __rpm_callback+0x7b/0x130
[ 421.358631] ? vga_switcheroo_runtime_resume+0x60/0x60
[ 421.358634] rpm_callback+0x1f/0x70
[ 421.358637] ? vga_switcheroo_runtime_resume+0x60/0x60
[ 421.358640] rpm_suspend+0x174/0x6d0
[ 421.358645] pm_runtime_work+0x94/0xa0
[ 421.358648] process_one_work+0x1da/0x3d0
[ 421.358651] worker_thread+0x4d/0x3e0
[ 421.358654] ? rescuer_thread+0x3f0/0x3f0
[ 421.358657] kthread+0x13e/0x160
[ 421.358660] ? __kthread_bind_mask+0x60/0x60
[ 421.358663] ret_from_fork+0x22/0x40
[ 421.358667] Modules linked in: iwlmvm joydev snd_hda_codec_realtek
mousedev amdgpu uvcvideo mac80211 snd_hda_codec_generic
snd_hda_codec_hdmi btusb videobuf2_vmalloc ledtrig_audio
videobuf2_memops btrtl edac_mce_amd btbcm videobuf2_v4l2 btintel kvm_amd
hid_multitouch snd_hda_intel msi_wmi videobuf2_common libarc4 bluetooth
snd_intel_dspcfg hid_generic sparse_keymap kvm snd_hda_codec videodev
nls_iso8859_1 nls_cp437 gpu_sched i2c_algo_bit vfat ecdh_generic fat ttm
irqbypass iwlwifi mc ecc snd_hda_core drm_kms_helper snd_hwdep snd_pcm
crct10dif_pclmul crc32_pclmul r8169 cec ghash_clmulni_intel snd_timer
cfg80211 aesni_intel rc_core syscopyarea sysfillrect snd crypto_simd
sp5100_tco realtek sysimgblt cryptd glue_helper fb_sys_fops psmouse ccp
pcspkr input_leds i2c_piix4 k10temp snd_pci_acp3x soundcore libphy
rfkill wmi battery ac i2c_hid tpm_crb hid tpm_tis tpm_tis_core tpm evdev
mac_hid pinctrl_amd rng_core acpi_cpufreq soc_button_array drm
crypto_user agpgart ip_tables x_tables serio_raw
[ 421.358700] atkbd libps2 xhci_pci xhci_hcd i8042 serio ext4
crc32c_generic crc32c_intel crc16 mbcache jbd2
[ 421.358710] CR2: 0000000000000038
[ 421.358714] ---[ end trace 7a1af789893080c2 ]---
[ 421.358848] RIP: 0010:kernel_queue_uninit+0xd/0xf0 [amdgpu]
[ 421.358851] Code: ee 48 89 c7 e8 a4 f9 ff ff 84 c0 0f 84 3a 40 1a 00
4c 89 e0 5d 41 5c 41 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 8b 47 10 48 89
fd <8b> 50 28 83 fa 02 74 78 83 fa 03 0f 84 b1 00 00 00 48 8b 7f 08 4c
[ 421.358853] RSP: 0018:ffffb34c00b4fca8 EFLAGS: 00010246
[ 421.358855] RAX: 0000000000000010 RBX: ffffa341022cdc00 RCX:
0000000080800076
[ 421.358857] RDX: 0000000080800077 RSI: 0000000000000000 RDI:
ffffa3411c874a80
[ 421.358858] RBP: ffffa3411c874a80 R08: 0000000000000001 R09:
0000000000000001
[ 421.358860] R10: ffffa34118da0660 R11: dead000000000100 R12:
ffffa341022cdd28
[ 421.358862] R13: ffffa3411c4030b0 R14: ffffa34103ca0000 R15:
ffffa3411f86c9b0
[ 421.358864] FS: 0000000000000000(0000) GS:ffffa3411f840000(0000)
knlGS:0000000000000000
[ 421.358866] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 421.358868] CR2: 0000000000000038 CR3: 00000003d240a000 CR4:
0000000000340ee0
I suppose the amdgpu module is at fault here?
Greetings
Harvey
--
I am root. If you see me laughing, you'd better have a backup!
More information about the amd-gfx
mailing list