[Bug 110413] GPU crash and failed reset leading to deadlock on Polaris 22 XL [Radeon RX Vega M GL]
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Fri Apr 12 14:52:44 UTC 2019
https://bugs.freedesktop.org/show_bug.cgi?id=110413
--- Comment #4 from RĂ©mi Verschelde <rverschelde at gmail.com> ---
Pasting some relevant output from attachment 143951 so that relevant keywords
can be found by Bugzilla searches.
```
[ 325.087186] mce: CPU7: Core temperature above threshold, cpu clock throttled
(total events = 1)
[ 325.087187] mce: CPU3: Core temperature above threshold, cpu clock throttled
(total events = 1)
[ 325.087188] mce: CPU3: Package temperature above threshold, cpu clock
throttled (total events = 1)
[ 325.087189] mce: CPU7: Package temperature above threshold, cpu clock
throttled (total events = 1)
[ 325.087224] mce: CPU5: Package temperature above threshold, cpu clock
throttled (total events = 1)
[ 325.087225] mce: CPU0: Package temperature above threshold, cpu clock
throttled (total events = 1)
[ 325.087226] mce: CPU1: Package temperature above threshold, cpu clock
throttled (total events = 1)
[ 325.087226] mce: CPU4: Package temperature above threshold, cpu clock
throttled (total events = 1)
[ 325.087227] mce: CPU6: Package temperature above threshold, cpu clock
throttled (total events = 1)
[ 325.087228] mce: CPU2: Package temperature above threshold, cpu clock
throttled (total events = 1)
[ 325.089212] mce: CPU7: Core temperature/speed normal
[ 325.089213] mce: CPU0: Package temperature/speed normal
[ 325.089214] mce: CPU3: Core temperature/speed normal
[ 325.089214] mce: CPU4: Package temperature/speed normal
[ 325.089215] mce: CPU7: Package temperature/speed normal
[ 325.089215] mce: CPU3: Package temperature/speed normal
[ 325.089248] mce: CPU6: Package temperature/speed normal
[ 325.089248] mce: CPU5: Package temperature/speed normal
[ 325.089249] mce: CPU2: Package temperature/speed normal
[ 325.089250] mce: CPU1: Package temperature/speed normal
[ 565.312183] amdgpu 0000:01:00.0: GPU fault detected: 147 0x0040d508 for
process pid 0 thread pid 0
[ 565.312194] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0x00169208
[ 565.312200] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS
0xFFFFFFFF
[ 565.312209] amdgpu 0000:01:00.0: VM fault (0xff, vmid 15, pasid 0) at page
1479176, write from '\xff\xff\xff\xff' (0xffffffff) (511)
[ 565.312219] amdgpu 0000:01:00.0: GPU fault detected: 147 0x00405508 for
process pid 0 thread pid 0
[ 565.312224] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0xFFFFFFFF
[ 565.312229] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS
0xFFFFFFFF
[ 565.312236] amdgpu 0000:01:00.0: VM fault (0xff, vmid 15, pasid 0) at page
4294967295, write from '\xff\xff\xff\xff' (0xffffffff) (511)
[ 565.312244] amdgpu 0000:01:00.0: GPU fault detected: 147 0x00485508 for
process pid 0 thread pid 0
[ 565.312248] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0xFFFFFFFF
[ 565.312252] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS
0xFFFFFFFF
[ 565.312258] amdgpu 0000:01:00.0: VM fault (0xff, vmid 15, pasid 0) at page
4294967295, write from '\xff\xff\xff\xff' (0xffffffff) (511)
<snip>
[ 565.312378] amdgpu 0000:01:00.0: GPU fault detected: 147 0x00785508 for
process pid 0 thread pid 0
[ 565.312383] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0xFFFFFFFF
[ 565.312387] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS
0xFFFFFFFF
[ 565.312393] amdgpu 0000:01:00.0: VM fault (0xff, vmid 15, pasid 0) at page
4294967295, write from '\xff\xff\xff\xff' (0xffffffff) (511)
[ 575.625913] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=117668, emitted seq=117670
[ 575.625950] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process starcrawlers.x8 pid 9151 thread starcrawle:cs0 pid 9162
[ 575.625953] amdgpu 0000:01:00.0: GPU reset begin!
[ 575.626419] amdgpu: [powerplay]
last message was failed ret is 65535
[ 575.626420] amdgpu: [powerplay]
failed to send message 281 ret is 65535
[ 575.636259] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend
of IP block <vce_v3_0> failed -110
[ 575.651311] amdgpu: [powerplay]
last message was failed ret is 65535
[ 575.651312] amdgpu: [powerplay]
failed to send message 133 ret is 65535
[ 575.651316] amdgpu: [powerplay]
last message was failed ret is 65535
[ 575.651316] amdgpu: [powerplay]
failed to send message 310 ret is 65535
[ 575.651317] amdgpu: [powerplay]
last message was failed ret is 65535
[ 575.651317] amdgpu: [powerplay]
failed to send message 5e ret is 65535
<snip>
[ 575.651340] amdgpu: [powerplay]
last message was failed ret is 65535
[ 575.651341] amdgpu: [powerplay]
failed to send message 84 ret is 65535
[ 575.651341] amdgpu: [powerplay] Failed to force to switch arbf0!
[ 575.651342] amdgpu: [powerplay] [disable_dpm_tasks] Failed to disable DPM!
[ 575.651360] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend
of IP block <powerplay> failed -22
[ 575.769673] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[ 575.769740] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[ 575.888355] cp is busy, skip halt cp
[ 576.007183] rlc is busy, skip halt rlc
[ 576.008188] amdgpu 0000:01:00.0: GPU pci config reset
[ 576.126260] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* ASIC reset
failed with err r, -22 for drm dev, 0000:01:00.0
[ 576.127736] Asynchronous wait on fence drm_sched:gfx:1ca87 timed out
(hint:submit_notify+0x0/0x58 [i915])
[ 576.127768] Asynchronous wait on fence drm_sched:gfx:1ca82 timed out
(hint:submit_notify+0x0/0x58 [i915])
[ 576.127788] Asynchronous wait on fence i915:Xorg[3673]/0:6455 timed out
(hint:intel_atomic_commit_ready+0x0/0x4c [i915])
[ 581.126683] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for
more than 5secs aborting
[ 581.126734] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios
stuck executing D654 (len 62, WS 0, PS 0) @ 0xD670
[ 581.126754] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios
stuck executing C410 (len 114, WS 0, PS 8) @ 0xC42B
[ 581.126755] [drm] asic atom init failed!
[ 581.126765] amdgpu 0000:01:00.0: GPU reset(2) failed
[ 581.126766] amdgpu 0000:01:00.0: GPU reset end with ret = -22
[ 581.126777] [drm] Skip scheduling IBs!
[ 581.126782] [drm] Skip scheduling IBs!
[ 581.126784] [drm] Skip scheduling IBs!
[ 581.126785] [drm] Skip scheduling IBs!
[ 581.126786] [drm] Skip scheduling IBs!
[ 581.126787] [drm] Skip scheduling IBs!
[ 581.126789] [drm] Skip scheduling IBs!
[ 581.126790] [drm] Skip scheduling IBs!
[ 581.126791] [drm] Skip scheduling IBs!
[ 591.487678] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=117670, emitted seq=117670
[ 591.487716] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process starcrawlers.x8 pid 9151 thread starcrawle:cs0 pid 9162
[ 591.487719] amdgpu 0000:01:00.0: GPU reset begin!
[ 591.488418] amdgpu: [powerplay]
last message was failed ret is 65535
[ 591.488419] amdgpu: [powerplay]
failed to send message 281 ret is 65535
[ 591.488495] WARNING: CPU: 2 PID: 666 at
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:788
dm_suspend+0x4e/0x60 [amdgpu]
[ 591.488496] Modules linked in: cmac rfcomm ccm msr ip6t_REJECT
nf_reject_ipv6 xt_comment ip6table_mangle ip6table_nat nf_nat_ipv6 ip6table_raw
nf_log_ipv6 ip6table_filter ip6_tables xt_recent ipt_IFWLOG ipt_psd xt_set
ip_set_hash_ip ip_set ipt_REJECT nf_reject_ipv4 xt_conntrack xt_hashlimit
xt_addrtype xt_mark iptable_mangle iptable_nat nf_nat_ipv4 xt_CT xt_tcpudp
iptable_raw nfnetlink_log xt_NFLOG nf_log_ipv4 nf_log_common xt_LOG
nf_conntrack_sane nf_conntrack_netlink nfnetlink nf_nat_tftp nf_nat_snmp_basic
nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_irc nf_nat_h323 nf_nat_ftp
nf_nat_amanda nf_nat nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp
nf_conntrack_proto_gre nf_conntrack_netbios_ns nf_conntrack_broadcast
nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp ts_kmp nf_conntrack_amanda
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter af_packet bnep
binfmt_misc fuse nls_iso8859_1 nls_cp437 vfat fat dm_mirror dm_region_hash
dm_log dm_mod snd_hda_codec_hdmi arc4 joydev
[ 591.488509] intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm
hid_sensor_incl_3d hid_sensor_gyro_3d hid_sensor_magn_3d hid_sensor_rotation
hid_sensor_accel_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf
hid_sensor_iio_common industrialio irqbypass hid_multitouch crc32_pclmul
crc32c_intel ghash_clmulni_intel spi_pxa2xx_platform 8250_dw iwlmvm
hid_sensor_hub aesni_intel iTCO_wdt iTCO_vendor_support mac80211
snd_hda_codec_realtek hid_generic aes_x86_64 input_leds tpm_crb crypto_simd
cryptd snd_hda_codec_generic glue_helper ledtrig_audio intel_cstate psmouse
intel_uncore iwlwifi snd_hda_intel thermal snd_hda_codec uvcvideo btusb
snd_hda_core btbcm videobuf2_vmalloc btrtl videobuf2_memops videobuf2_v4l2
btintel videobuf2_common cfg80211 snd_hwdep videodev snd_pcm bluetooth media
snd_timer intel_rapl_perf pinctrl_sunrisepoint ucsi_acpi typec_ucsi usbhid
typec tpm_tis pinctrl_intel intel_wmi_thunderbolt snd tpm_tis_core hp_wmi
soundcore tpm wmi_bmof idma64 ecdh_generic
[ 591.488521] int3400_thermal battery virt_dma button acpi_thermal_rel
rtsx_pci_ms intel_vbtn i2c_i801 acpi_pad hp_wireless ac rfkill sparse_keymap
int3403_thermal memstick mei_me mei intel_lpss_pci intel_pch_thermal intel_lpss
processor_thermal_device intel_ishtp_hid int340x_thermal_zone
intel_soc_dts_iosf evdev nvram sch_fq_codel efivarfs ip_tables x_tables ipv6
crc_ccitt autofs4 amdgpu xhci_pci rtsx_pci_sdmmc xhci_hcd mmc_block mmc_core
usbcore serio_raw chash amd_iommu_v2 rtsx_pci gpu_sched intel_ish_ipc ttm
intel_ishtp usb_common i915 i2c_hid hid i2c_algo_bit drm_kms_helper wmi video
drm
[ 591.488549] CPU: 2 PID: 666 Comm: kworker/2:2 Not tainted
5.0.7-desktop-4.mga7 #1
[ 591.488550] Hardware name: HP HP Spectre x360 Convertible 15-ch0xx/83BB,
BIOS F.24 11/06/2018
[ 591.488552] Workqueue: events drm_sched_job_timedout [gpu_sched]
[ 591.488627] RIP: 0010:dm_suspend+0x4e/0x60 [amdgpu]
[ 591.488627] Code: 00 48 89 83 70 cb 00 00 e8 af fc ff ff 48 89 df e8 67 75
00 00 48 8b bb 60 b3 00 00 be 08 00 00 00 e8 16 8f 0a 00 31 c0 5b c3 <0f> 0b eb
c1 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00
[ 591.488628] RSP: 0018:ffffb50201f97d20 EFLAGS: 00010282
[ 591.488629] RAX: ffffffffc08a3e00 RBX: ffff93f4a35c0000 RCX:
0000000000000012
[ 591.488629] RDX: 0000000000000080 RSI: 0000000000000001 RDI:
ffff93f4a35c0000
[ 591.488629] RBP: ffff93f4a35ccb98 R08: 0000000000000492 R09:
0000000000000004
[ 591.488630] R10: 0000000000000000 R11: 0000000000000001 R12:
ffff93f4a35c0000
[ 591.488630] R13: ffffffffc09e25a0 R14: 0000000000000000 R15:
ffff93f4a35c3498
[ 591.488631] FS: 0000000000000000(0000) GS:ffff93f4b1c80000(0000)
knlGS:0000000000000000
[ 591.488631] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 591.488632] CR2: 00007f8c18a40a38 CR3: 000000033220e002 CR4:
00000000003606e0
[ 591.488632] Call Trace:
[ 591.488676] amdgpu_device_ip_suspend_phase1+0x94/0xc0 [amdgpu]
[ 591.488721] amdgpu_device_ip_suspend+0x1b/0x60 [amdgpu]
[ 591.488796] amdgpu_device_pre_asic_reset+0x9e/0x260 [amdgpu]
[ 591.488817] amdgpu_device_gpu_recover+0x87/0x7e0 [amdgpu]
[ 591.488828] ? drm_err+0x72/0x90 [drm]
[ 591.488882] amdgpu_job_timedout+0xfc/0x120 [amdgpu]
[ 591.488884] drm_sched_job_timedout+0x39/0x60 [gpu_sched]
[ 591.488887] process_one_work+0x200/0x400
[ 591.488888] worker_thread+0x2d/0x3d0
[ 591.488889] ? process_one_work+0x400/0x400
[ 591.488891] kthread+0x112/0x130
[ 591.488892] ? kthread_create_on_node+0x60/0x60
[ 591.488894] ret_from_fork+0x35/0x40
[ 591.488895] ---[ end trace 356c1ae357df635c ]---
[ 591.499325] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend
of IP block <vce_v3_0> failed -110
```
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190412/7cbe81e9/attachment.html>
More information about the dri-devel
mailing list