[Bug 109955] amdgpu [RX Vega 64] system freeze while gaming

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sat Oct 19 21:26:58 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #114 from Rodney A Morris <rodamorris at gmail.com> ---
To rule out possible hardware issues, I purchased another Vega 64 card.  This
time a factory overclocked card.  Since installing the card, I have experienced
three lock ups.  Two playing Stellaris and one while playing a youtube video. 
After playing Stellaris without issue two weeks ago, the computer locked up
twice last night.  While my previous problems seemed to be, in part, linked to
a circular lock dependence, the last logs indicate something different.  I'm
seeing a lot of powerplay errors after the fence timeout.  Hope this new
information provides some insight into the problem.

         /:-------------:\          rmorris at ezra.blanchardmorris.net 
       :-------------------::        -------------------------------- 
     :-----------/shhOHbmp---:\      OS: Fedora release 30 (Thirty) x86_64 
   /-----------omMMMNNNMMD  ---:     Kernel: 5.3.6-200.fc30.x86_64 
  :-----------sMMMMNMNMP.    ---:    Uptime: 16 hours, 21 mins 
 :-----------:MMMdP-------    ---\   Packages: 2214 (rpm), 36 (flatpak) 
,------------:MMMd--------    ---:   Shell: bash 5.0.7 
:------------:MMMd-------    .---:   Resolution: 2560x1440 
:----    oNMMMMMMMMMNho     .----:   DE: GNOME 3.32.2 
:--     .+shhhMMMmhhy++   .------/   WM: Mutter 
:-    -------:MMMd--------------:    WM Theme: Adwaita 
:-   --------/MMMd-------------;     Theme: Adapta-Nokto-Eta [GTK2/3] 
:-    ------/hMMMy------------:      Icons: Adwaita [GTK2/3] 
:-- :dMNdhhdNMMNo------------;       Terminal: tilix 
:---:sdNMMMMNds:------------:        CPU: Intel i7-6850K (12) @ 4.000GHz 
:------:://:-------------::          GPU: AMD ATI Radeon RX Vega 56/64 
:---------------------://            Memory: 2814MiB / 32036MiB 


Card:

MSI Vega 64 OC (Card works fine under windows 10)

Game being played:

Stellaris

Native Game

Description of Event:
Screen goes blank and music and sound continues to play before computer locks
up or reboots.

relevant dmesg from crash:
[ 4244.670269] perf: interrupt took too long (2502 > 2500), lowering
kernel.perf_event_max_sample_rate to 79000
[ 4298.241156] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out or interrupted!
[ 4304.385587] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring page1 timeout,
signaled seq=60549844, emitted seq=60549846
[ 4304.385634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process  pid 0 thread  pid 0
[ 4304.385637] amdgpu 0000:06:00.0: GPU reset begin!
[ 4304.402938] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4304.402945] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4304.402947] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4304.402948] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4304.404006] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4308.481068] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
*ERROR* [CRTC:47:crtc-0] flip_done timed out
[ 4314.625180] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:47:crtc-0]
hw_done or flip_done timed out
[ 4324.865057] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [CRTC:47:crtc-0] flip_done timed out
[ 4335.105035] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [PLANE:45:plane-5] flip_done timed out
[ 4336.695112] amdgpu: [powerplay] No response from smu
[ 4336.695128] amdgpu: [powerplay] Failed message: 0xe, input parameter: 0x0,
error code: 0x0
[ 4338.307125] amdgpu: [powerplay] No response from smu
[ 4339.922039] amdgpu: [powerplay] No response from smu
[ 4339.922043] amdgpu: [powerplay] Failed message: 0x42, input parameter: 0x1,
error code: 0x0
[ 4341.541675] amdgpu: [powerplay] No response from smu
[ 4343.162102] amdgpu: [powerplay] No response from smu
[ 4343.162105] amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0,
error code: 0x0
[ 4343.221953] [drm] REG_WAIT timeout 10us * 3500 tries - dce_mi_free_dmif
line:634
[ 4343.221962] ------------[ cut here ]------------
[ 4343.222070] WARNING: CPU: 0 PID: 16500 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:332
generic_reg_wait.cold+0x31/0x53 [amdgpu]
[ 4343.222072] Modules linked in: rfcomm xt_CHECKSUM xt_MASQUERADE tun bridge
stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter
ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat
ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security
iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter ip_tables cmac bnep nct6775
hwmon_vid intel_rapl_msr intel_rapl_common vfat fat fuse x86_pkg_temp_thermal
intel_powerclamp coretemp iwlmvm kvm_intel iTCO_wdt iTCO_vendor_support
mac80211 kvm snd_hda_codec_realtek irqbypass snd_hda_codec_generic
snd_hda_codec_hdmi libarc4 ledtrig_audio crct10dif_pclmul snd_hda_intel
crc32_pclmul iwlwifi snd_hda_codec snd_hda_core btusb ghash_clmulni_intel btrtl
intel_cstate snd_hwdep btbcm btintel intel_uncore snd_seq snd_seq_device
intel_rapl_perf bluetooth
[ 4343.222099]  mxm_wmi cfg80211 snd_pcm joydev ecdh_generic ecc mei_me
snd_timer rfkill snd mei i2c_i801 soundcore lpc_ich binfmt_misc auth_rpcgss
sunrpc amdgpu amd_iommu_v2 gpu_sched ttm drm_kms_helper crc32c_intel uas
mpt3sas igb drm e1000e nvme usb_storage dca i2c_algo_bit raid_class nvme_core
scsi_transport_sas wmi
[ 4343.222114] CPU: 0 PID: 16500 Comm: kworker/0:1 Not tainted
5.3.6-200.fc30.x86_64+debug #1
[ 4343.222115] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99
Taichi, BIOS P1.80 04/06/2018
[ 4343.222119] Workqueue: events drm_sched_job_timedout [gpu_sched]
[ 4343.222167] RIP: 0010:generic_reg_wait.cold+0x31/0x53 [amdgpu]
[ 4343.222169] Code: 4c 24 18 44 89 fa 89 ee 48 c7 c7 f8 9d 73 c0 e8 60 46 b0
fa 83 7b 20 01 0f 84 02 ee fd ff 48 c7 c7 f0 9c 73 c0 e8 4a 46 b0 fa <0f> 0b e9
ef ed fd ff 48 c7 c7 f0 9c 73 c0 89 54 24 04 e8 33 46 b0
[ 4343.222170] RSP: 0018:ffffabda8729b690 EFLAGS: 00010246
[ 4343.222172] RAX: 0000000000000024 RBX: ffff9ceeab58f700 RCX:
0000000000000006
[ 4343.222173] RDX: 0000000000000000 RSI: ffff9ceeb50c8e50 RDI:
ffff9ceebe5d9e00
[ 4343.222174] RBP: 000000000000000a R08: 000003f33c33ca38 R09:
0000000000000000
[ 4343.222175] R10: 0000000000000000 R11: 0000000000000000 R12:
00000000000035af
[ 4343.222176] R13: 0000000000000dad R14: 0000000000000001 R15:
0000000000000dac
[ 4343.222178] FS:  0000000000000000(0000) GS:ffff9ceebe400000(0000)
knlGS:0000000000000000
[ 4343.222179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4343.222180] CR2: 00007f1480ef70c0 CR3: 0000000703f30002 CR4:
00000000003606f0
[ 4343.222182] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 4343.222183] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 4343.222184] Call Trace:
[ 4343.222237]  dce_mi_free_dmif+0xef/0x150 [amdgpu]
[ 4343.222285]  dce110_reset_hw_ctx_wrap+0x15f/0x200 [amdgpu]
[ 4343.222333]  dce110_apply_ctx_to_hw+0x4b/0x530 [amdgpu]
[ 4343.222365]  ? amdgpu_pm_compute_clocks+0xc9/0x5f0 [amdgpu]
[ 4343.222414]  ? dm_pp_apply_display_requirements+0x1a8/0x1c0 [amdgpu]
[ 4343.222461]  dc_commit_state+0x26b/0x590 [amdgpu]
[ 4343.222514]  amdgpu_dm_atomic_commit_tail+0xd18/0x1cf0 [amdgpu]
[ 4343.222521]  ? __lock_acquire+0x247/0x1910
[ 4343.222525]  ? find_held_lock+0x32/0x90
[ 4343.222529]  ? find_held_lock+0x32/0x90
[ 4343.222533]  ? sched_clock+0x5/0x10
[ 4343.222536]  ? mark_held_locks+0x50/0x80
[ 4343.222540]  ? __lock_acquire+0x247/0x1910
[ 4343.222545]  ? wake_up_klogd+0x37/0x40
[ 4343.222549]  ? find_held_lock+0x32/0x90
[ 4343.222552]  ? mark_held_locks+0x50/0x80
[ 4343.222556]  ? _raw_spin_unlock_irq+0x29/0x40
[ 4343.222559]  ? lockdep_hardirqs_on+0xf0/0x180
[ 4343.222561]  ? _raw_spin_unlock_irq+0x29/0x40
[ 4343.222564]  ? wait_for_completion_timeout+0x75/0x190
[ 4343.222576]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
[ 4343.222622]  ? amdgpu_dm_audio_eld_notify+0x60/0x60 [amdgpu]
[ 4343.222628]  commit_tail+0x3c/0x70 [drm_kms_helper]
[ 4343.222634]  drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
[ 4343.222640]  drm_atomic_helper_disable_all+0x14c/0x160 [drm_kms_helper]
[ 4343.222647]  drm_atomic_helper_suspend+0x66/0x100 [drm_kms_helper]
[ 4343.222698]  dm_suspend+0x20/0x60 [amdgpu]
[ 4343.222726]  amdgpu_device_ip_suspend_phase1+0x91/0xc0 [amdgpu]
[ 4343.222755]  amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
[ 4343.222801]  amdgpu_device_pre_asic_reset+0x191/0x1a4 [amdgpu]
[ 4343.222849]  amdgpu_device_gpu_recover+0x260/0x934 [amdgpu]
[ 4343.222893]  amdgpu_job_timedout+0x115/0x140 [amdgpu]
[ 4343.222899]  drm_sched_job_timedout+0x44/0xa0 [gpu_sched]
[ 4343.222903]  process_one_work+0x272/0x5a0
[ 4343.222908]  worker_thread+0x50/0x3b0
[ 4343.222915]  kthread+0x108/0x140
[ 4343.222916]  ? process_one_work+0x5a0/0x5a0
[ 4343.222918]  ? kthread_park+0x80/0x80
[ 4343.222921]  ret_from_fork+0x3a/0x50
[ 4343.222929] irq event stamp: 82808
[ 4343.222931] hardirqs last  enabled at (82807): [<ffffffffbb1716eb>]
console_unlock+0x46b/0x5d0
[ 4343.222935] hardirqs last disabled at (82808): [<ffffffffbb0038da>]
trace_hardirqs_off_thunk+0x1a/0x20
[ 4343.222938] softirqs last  enabled at (82794): [<ffffffffbbe0035d>]
__do_softirq+0x35d/0x45d
[ 4343.222942] softirqs last disabled at (82787): [<ffffffffbb0f2077>]
irq_exit+0xf7/0x100
[ 4343.222943] ---[ end trace 71731c9cc205c24d ]---
[ 4344.758203] amdgpu: [powerplay] No response from smu
[ 4346.363061] amdgpu: [powerplay] No response from smu
[ 4346.363065] amdgpu: [powerplay] Failed to send message: 0x26, ret value: 0x0
[ 4347.973948] amdgpu: [powerplay] No response from smu
[ 4349.588168] amdgpu: [powerplay] No response from smu
[ 4349.588173] amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x1,
error code: 0x0
[ 4351.152764] amdgpu: [powerplay] No response from smu
[ 4352.722063] amdgpu: [powerplay] No response from smu
[ 4352.722068] amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x3,
error code: 0x0
[ 4354.325541] amdgpu: [powerplay] No response from smu
[ 4355.924138] amdgpu: [powerplay] No response from smu
[ 4355.924141] amdgpu: [powerplay] Failed to send message: 0x63, ret value: 0x0
[ 4357.537736] amdgpu: [powerplay] No response from smu
[ 4359.154141] amdgpu: [powerplay] No response from smu
[ 4359.154146] amdgpu: [powerplay] Failed message: 0x9, input parameter: 0xf4,
error code: 0x0
[ 4360.760856] amdgpu: [powerplay] No response from smu
[ 4362.372410] amdgpu: [powerplay] No response from smu
[ 4362.372414] amdgpu: [powerplay] Failed message: 0xa, input parameter:
0xa0b000, error code: 0x0
[ 4363.985961] amdgpu: [powerplay] No response from smu
[ 4365.599325] amdgpu: [powerplay] No response from smu
[ 4365.599331] amdgpu: [powerplay] Failed message: 0xe, input parameter: 0x0,
error code: 0x0
[ 4367.214945] amdgpu: [powerplay] No response from smu
[ 4368.829650] amdgpu: [powerplay] No response from smu
[ 4368.829655] amdgpu: [powerplay] Failed message: 0x42, input parameter: 0x1,
error code: 0x0
[ 4370.443783] amdgpu: [powerplay] No response from smu
[ 4372.057288] amdgpu: [powerplay] No response from smu
[ 4372.057293] amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0,
error code: 0x0
[ 4372.074301] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4372.074308] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4372.074310] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4372.074312] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4372.074569] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4372.091832] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4372.091837] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4372.091839] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4372.091840] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4372.091889] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4372.109371] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4372.109376] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4372.109378] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4372.109380] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4372.126998] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4372.127002] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4372.127009] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4372.127021] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4372.127024] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4372.127083] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4372.144452] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4372.144457] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4372.144458] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4372.144460] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4372.144514] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4372.161992] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4372.161997] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4372.161999] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4372.162001] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4372.162086] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4372.179534] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4372.179538] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4372.179540] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4372.179542] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4372.179674] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4372.197074] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4372.197079] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4372.197081] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4372.197082] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4372.197131] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4372.214616] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal)
error received: 0000:00:03.0
[ 4372.267239] amdgpu: [powerplay] Failed to send message: 0x61, ret value:
0xffffffff

Relevant journalctl messages:

Oct 18 21:49:47 ezra.blanchardmorris.net kernel: perf: interrupt took too long
(2502 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
Oct 18 21:50:47 ezra.blanchardmorris.net kernel:
[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed
out or interrupted!
Oct 18 21:50:47 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring page1 timeout, signaled seq=60549844, emitted
seq=60549846
Oct 18 21:50:47 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Oct 18 21:50:47 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: GPU reset
begin!
Oct 18 21:50:47 ezra.blanchardmorris.net kernel: pcieport 0000:00:03.0: AER:
Uncorrected (Non-Fatal) error received: 0000:00:03.0
Oct 18 21:50:47 ezra.blanchardmorris.net kernel: pcieport 0000:00:03.0: AER:
PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer,
(Requester ID)
Oct 18 21:50:47 ezra.blanchardmorris.net kernel: pcieport 0000:00:03.0: AER:  
device [8086:6f08] error status/mask=00004000/00000000
Oct 18 21:50:47 ezra.blanchardmorris.net kernel: pcieport 0000:00:03.0: AER:   
[14] CmpltTO                (First)
Oct 18 21:50:47 ezra.blanchardmorris.net kernel: pcieport 0000:00:03.0: AER:
Device recovery failed
Oct 18 21:50:51 ezra.blanchardmorris.net kernel:
[drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR*
[CRTC:47:crtc-0] flip_done timed out
Oct 18 21:50:57 ezra.blanchardmorris.net kernel: [drm:amdgpu_dm_atomic_check
[amdgpu]] *ERROR* [CRTC:47:crtc-0] hw_done or flip_done timed out
Oct 18 21:51:07 ezra.blanchardmorris.net kernel:
[drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR*
[CRTC:47:crtc-0] flip_done timed out
Oct 18 21:51:18 ezra.blanchardmorris.net kernel:
[drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR*
[PLANE:45:plane-5] flip_done timed out
Oct 18 21:51:19 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:19 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0xe, input parameter: 0x0, error code: 0x0
Oct 18 21:51:21 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:22 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:22 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0x42, input parameter: 0x1, error code: 0x0
Oct 18 21:51:24 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0x24, input parameter: 0x0, error code: 0x0
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: [drm] REG_WAIT timeout 10us *
3500 tries - dce_mi_free_dmif line:634
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: ------------[ cut here
]------------
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: WARNING: CPU: 0 PID: 16500 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:332
generic_reg_wait.cold+0x31/0x53 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: Modules linked in: rfcomm
xt_CHECKSUM xt_MASQUERADE tun bridge stp llc nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat
ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter ip_tables cmac bnep nct6775 hwmon_vid
intel_rapl_msr intel_rapl_common vfat fat fuse x86_pkg_temp_thermal
intel_powerclamp coretemp iwlmvm kvm_intel iTCO_wdt iTCO_vendor_support
mac80211 kvm snd_hda_codec_realtek irqbypass snd_hda_codec_generic
snd_hda_codec_hdmi libarc4 ledtrig_audio crct10dif_pclmul snd_hda_intel
crc32_pclmul iwlwifi snd_hda_codec snd_hda_core btusb ghash_clmulni_intel btrtl
intel_cstate snd_hwdep btbcm btintel intel_uncore snd_seq snd_seq_device
intel_rapl_perf bluetooth
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  mxm_wmi cfg80211 snd_pcm
joydev ecdh_generic ecc mei_me snd_timer rfkill snd mei i2c_i801 soundcore
lpc_ich binfmt_misc auth_rpcgss sunrpc amdgpu amd_iommu_v2 gpu_sched ttm
drm_kms_helper crc32c_intel uas mpt3sas igb drm e1000e nvme usb_storage dca
i2c_algo_bit raid_class nvme_core scsi_transport_sas wmi
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: CPU: 0 PID: 16500 Comm:
kworker/0:1 Not tainted 5.3.6-200.fc30.x86_64+debug #1
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: Hardware name: To Be Filled By
O.E.M. To Be Filled By O.E.M./X99 Taichi, BIOS P1.80 04/06/2018
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: Workqueue: events
drm_sched_job_timedout [gpu_sched]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: RIP:
0010:generic_reg_wait.cold+0x31/0x53 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: Code: 4c 24 18 44 89 fa 89 ee
48 c7 c7 f8 9d 73 c0 e8 60 46 b0 fa 83 7b 20 01 0f 84 02 ee fd ff 48 c7 c7 f0
9c 73 c0 e8 4a 46 b0 fa <0f> 0b e9 ef ed fd ff 48 c7 c7 f0 9c 73 c0 89 54 24 04
e8 33 46 b0
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: RSP: 0018:ffffabda8729b690
EFLAGS: 00010246
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: RAX: 0000000000000024 RBX:
ffff9ceeab58f700 RCX: 0000000000000006
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: RDX: 0000000000000000 RSI:
ffff9ceeb50c8e50 RDI: ffff9ceebe5d9e00
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: RBP: 000000000000000a R08:
000003f33c33ca38 R09: 0000000000000000
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: R10: 0000000000000000 R11:
0000000000000000 R12: 00000000000035af
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: R13: 0000000000000dad R14:
0000000000000001 R15: 0000000000000dac
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: FS:  0000000000000000(0000)
GS:ffff9ceebe400000(0000) knlGS:0000000000000000
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: CS:  0010 DS: 0000 ES: 0000
CR0: 0000000080050033
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: CR2: 00007f1480ef70c0 CR3:
0000000703f30002 CR4: 00000000003606f0
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: DR3: 0000000000000000 DR6:
00000000fffe0ff0 DR7: 0000000000000400
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: Call Trace:
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  dce_mi_free_dmif+0xef/0x150
[amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
dce110_reset_hw_ctx_wrap+0x15f/0x200 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
dce110_apply_ctx_to_hw+0x4b/0x530 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ?
amdgpu_pm_compute_clocks+0xc9/0x5f0 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ?
dm_pp_apply_display_requirements+0x1a8/0x1c0 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  dc_commit_state+0x26b/0x590
[amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
amdgpu_dm_atomic_commit_tail+0xd18/0x1cf0 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? __lock_acquire+0x247/0x1910
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? find_held_lock+0x32/0x90
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? find_held_lock+0x32/0x90
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? sched_clock+0x5/0x10
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? mark_held_locks+0x50/0x80
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? __lock_acquire+0x247/0x1910
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? wake_up_klogd+0x37/0x40
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? find_held_lock+0x32/0x90
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? mark_held_locks+0x50/0x80
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ?
_raw_spin_unlock_irq+0x29/0x40
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ?
lockdep_hardirqs_on+0xf0/0x180
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ?
_raw_spin_unlock_irq+0x29/0x40
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ?
wait_for_completion_timeout+0x75/0x190
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? commit_tail+0x3c/0x70
[drm_kms_helper]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ?
amdgpu_dm_audio_eld_notify+0x60/0x60 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  commit_tail+0x3c/0x70
[drm_kms_helper]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
drm_atomic_helper_disable_all+0x14c/0x160 [drm_kms_helper]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
drm_atomic_helper_suspend+0x66/0x100 [drm_kms_helper]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  dm_suspend+0x20/0x60 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
amdgpu_device_ip_suspend_phase1+0x91/0xc0 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
amdgpu_device_pre_asic_reset+0x191/0x1a4 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
amdgpu_device_gpu_recover+0x260/0x934 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
amdgpu_job_timedout+0x115/0x140 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
drm_sched_job_timedout+0x44/0xa0 [gpu_sched]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  process_one_work+0x272/0x5a0
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  worker_thread+0x50/0x3b0
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  kthread+0x108/0x140
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ?
process_one_work+0x5a0/0x5a0
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? kthread_park+0x80/0x80
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ret_from_fork+0x3a/0x50
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: irq event stamp: 82808
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: hardirqs last  enabled at
(82807): [<ffffffffbb1716eb>] console_unlock+0x46b/0x5d0
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: hardirqs last disabled at
(82808): [<ffffffffbb0038da>] trace_hardirqs_off_thunk+0x1a/0x20
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: softirqs last  enabled at
(82794): [<ffffffffbbe0035d>] __do_softirq+0x35d/0x45d
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: softirqs last disabled at
(82787): [<ffffffffbb0f2077>] irq_exit+0xf7/0x100
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: ---[ end trace
71731c9cc205c24d ]---
Oct 18 21:51:27 ezra.blanchardmorris.net abrt-dump-journal-oops[1493]:
abrt-dump-journal-oops: Found oopses: 1
Oct 18 21:51:27 ezra.blanchardmorris.net abrt-dump-journal-oops[1493]:
abrt-dump-journal-oops: Creating problem directories
Oct 18 21:51:27 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:28 ezra.blanchardmorris.net abrt-dump-journal-oops[1493]: Reported
1 kernel oopses to Abrt
Oct 18 21:51:29 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:29 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed to
send message: 0x26, ret value: 0x0
Oct 18 21:51:30 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:32 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:32 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0x4c, input parameter: 0x1, error code: 0x0
Oct 18 21:51:34 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:35 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:35 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0x4c, input parameter: 0x3, error code: 0x0
Oct 18 21:51:37 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:38 ezra.blanchardmorris.net abrt-server[16691]: Can't find a
meaningful backtrace for hashing in '.'
Oct 18 21:51:38 ezra.blanchardmorris.net abrt-server[16691]: Option
'DropNotReportableOopses' is not configured
Oct 18 21:51:38 ezra.blanchardmorris.net abrt-server[16691]: Preserving oops
'.' because DropNotReportableOopses is 'no'
Oct 18 21:51:38 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:38 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed to
send message: 0x63, ret value: 0x0
Oct 18 21:51:40 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:42 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:42 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0x9, input parameter: 0xf4, error code: 0x0
Oct 18 21:51:42 ezra.blanchardmorris.net abrt-notification[16713]: System
encountered a non-fatal error in ??()
Oct 18 21:51:43 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:45 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20191019/03b05a85/attachment-0001.html>


More information about the dri-devel mailing list