radeon GPU lockup followed by list_del corruption

Jonathon Reinhart jonathon.reinhart at gmail.com
Wed Nov 29 03:09:43 UTC 2017


Hello,

I just encountered a one-off GPU lockup (evidenced by a display
freeze, followed by several "flashes"). The GPU was mostly inactive,
with just Chrome and some gnome-terminals open.

    GPU: Radeon HD 7850
    Kernel: 4.10.13-200.fc25.x86_64

I decided to email this list rather than ignore this seemingly random
event because:

1. The GPU, although a few years old, wasn't doing any heavy
processing, indicating heat-related faults.

2. While driver bugs may manifest themselves in light of hardware
troubles, the "list_del corruption" BUG is highly undesirable -- a GPU
lockup shouldn't panic the box.

I've uploaded the full dmesg log:
https://gist.github.com/JonathonReinhart/ede339261daa00e6924a92cc32c6e26e
Notable snippets are posted at the end of this message.

Let me know if I can provide any more information.

Regards,

Jonathon Reinhart





Nov 28 21:24:29.743859 kernel: WARNING: CPU: 3 PID: 32172 at
drivers/gpu/drm/radeon/radeon_object.c:84 radeon_ttm_bo_destroy+0xf6/0
   x100 [radeon]
...
Nov 28 21:25:03.899302 kernel: radeon 0000:01:00.0: still active bo inside vm
Nov 28 21:25:03.899353 kernel: list_del corruption. prev->next should
be ffff913cfff40500, but was ffff913dc1dc6380
Nov 28 21:25:03.899377 kernel: ------------[ cut here ]------------
Nov 28 21:25:03.899396 kernel: kernel BUG at lib/list_debug.c:52!
Nov 28 21:25:03.899542 kernel: invalid opcode: 0000 [#1] SMP
Nov 28 21:25:03.899665 kernel: Modules linked in: veth
nf_conntrack_netlink xt_addrtype br_netfilter dm_thin_pool
dm_persistent_data dm_bio_prison loop xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp
llc ip6table_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
nf_nat_ipv6 ip6table_mangle ip6table_security iptable_raw iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
libcrc32c iptable_mangle iptable_security ebtable_filter ebtables
ip6table_filter ip6_tables binfmt_misc fuse usblp raid1 intel_rapl
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iTCO_wdt
eeepc_wmi iTCO_vendor_support asus_wmi sparse_keymap rfkill kvm
Nov 28 21:25:03.899694 kernel:  mxm_wmi irqbypass crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore
intel_rapl_perf snd_hda_codec_realtek snd_hda_codec_generic
snd_hda_codec_hdmi snd_hda_intel snd_hda_codec joydev lpc_ich i2c_i801
snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd
soundcore mei_me video wmi mei shpchp tpm_tis tpm_tis_core tpm nfsd
auth_rpcgss nfs_acl lockd grace sunrpc hid_logitech_hidpp
hid_logitech_dj ata_generic pata_acpi amdkfd amd_iommu_v2 radeon
crc32c_intel i2c_algo_bit drm_kms_helper serio_raw ttm drm
firewire_ohci pata_marvell r8169 firewire_core crc_itu_t mii fjes
Nov 28 21:25:03.899713 kernel: CPU: 0 PID: 32172 Comm: chrome Tainted:
G        W       4.10.13-200.fc25.x86_64 #1
Nov 28 21:25:03.899831 kernel: Hardware name: System manufacturer
System Product Name/P8P67 LE, BIOS 3801 09/12/2013
Nov 28 21:25:03.899897 kernel: task: ffff9140c8d7a580 task.stack:
ffff9e9601a64000
Nov 28 21:25:03.899915 kernel: RIP: 0010:__list_del_entry_valid+0x75/0x90
Nov 28 21:25:03.899934 kernel: RSP: 0018:ffff9e9601a67b40 EFLAGS: 00010286
Nov 28 21:25:03.899948 kernel: RAX: 0000000000000054 RBX:
ffff913cfff40500 RCX: 0000000000000006
Nov 28 21:25:03.899965 kernel: RDX: 0000000000000000 RSI:
0000000000000246 RDI: ffff9140dec0e0e0
Nov 28 21:25:03.899981 kernel: RBP: ffff9e9601a67b40 R08:
00000000000c7b0c R09: 0000000000002854
Nov 28 21:25:03.900083 kernel: R10: 0000000000000004 R11:
ffffffffa2224d0f R12: ffff913dc51c5c00
Nov 28 21:25:03.900104 kernel: R13: ffff913d6da30400 R14:
ffff913ce3f7f000 R15: ffff9140c7578000
Nov 28 21:25:03.900121 kernel: FS:  00007f57694e1f80(0000)
GS:ffff9140dec00000(0000) knlGS:0000000000000000
Nov 28 21:25:03.900190 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 28 21:25:03.900211 kernel: CR2: 000055f41a64728f CR3:
0000000112551000 CR4: 00000000000426f0
Nov 28 21:25:03.900275 kernel: Call Trace:
Nov 28 21:25:03.900295 kernel:  radeon_vm_fini+0x89/0x250 [radeon]
Nov 28 21:25:03.900317 kernel:  radeon_driver_postclose_kms+0xf1/0x150 [radeon]
Nov 28 21:25:03.900333 kernel:  drm_release+0x28c/0x3a0 [drm]
Nov 28 21:25:03.900350 kernel:  __fput+0xdf/0x1e0
Nov 28 21:25:03.900365 kernel:  ____fput+0xe/0x10
Nov 28 21:25:03.900381 kernel:  task_work_run+0x80/0xa0
Nov 28 21:25:03.900397 kernel:  do_exit+0x2c8/0xb80
Nov 28 21:25:03.900416 kernel:  do_group_exit+0x47/0xb0
Nov 28 21:25:03.900454 kernel:  get_signal+0x289/0x630
Nov 28 21:25:03.900470 kernel:  do_signal+0x37/0x690
Nov 28 21:25:03.900487 kernel:  ? padata_reorder+0x160/0x1d0
Nov 28 21:25:03.900504 kernel:  ? is_prefetch.isra.18+0x95/0x1a0
Nov 28 21:25:03.900518 kernel:  ? padata_reorder+0x160/0x1d0
Nov 28 21:25:03.900535 kernel:  ? mm_fault_error+0x13c/0x190
Nov 28 21:25:03.900552 kernel:  ? __do_page_fault+0x4ad/0x4e0
Nov 28 21:25:03.900569 kernel:  exit_to_usermode_loop+0x76/0xb0
Nov 28 21:25:03.900603 kernel:  prepare_exit_to_usermode+0x40/0x50
Nov 28 21:25:03.900625 kernel:  retint_user+0x8/0x10
Nov 28 21:25:03.900641 kernel: RIP: 0033:0x7f5762ae6a46
Nov 28 21:25:03.900658 kernel: RSP: 002b:00007fff8274e078 EFLAGS: 00010246
Nov 28 21:25:03.900675 kernel: RAX: 00007f574a30c600 RBX:
00001e280f68be00 RCX: 7e0a03044a08000c
Nov 28 21:25:03.900688 kernel: RDX: 0000000000000008 RSI:
7e0a03044a08000c RDI: 00007f574a30c600
Nov 28 21:25:03.900706 kernel: RBP: 0000000000000040 R08:
0000000000000000 R09: 0000000000000001
Nov 28 21:25:03.900721 kernel: R10: 0013eb23be7a8564 R11:
0000000000000001 R12: 00001e280f24b830
Nov 28 21:25:03.900740 kernel: R13: 0000000000000000 R14:
00001e280f1e8e00 R15: 00001e280f24b670
Nov 28 21:25:03.900757 kernel: Code: 08 26 c9 a1 e8 ec 66 d9 ff 0f 0b
48 89 fe 48 c7 c7 40 26 c9 a1 e8 db 66 d9 ff 0f 0b 48 89 fe 48 c7 c7
78 26 c9 a1 e8 ca 66 d9 ff <0f> 0b 48 89 fe 48 c7 c7 b8 26 c9 a1 e8 b9
66 d9 ff 0f 0b 0f 1f
Nov 28 21:25:03.900783 kernel: RIP: __list_del_entry_valid+0x75/0x90
RSP: ffff9e9601a67b40
Nov 28 21:25:03.900830 kernel: ---[ end trace 336b91ace1e9dcee ]---


More information about the amd-gfx mailing list