<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - amdgpu [RX Vega 64] system freeze while gaming"
href="https://bugs.freedesktop.org/show_bug.cgi?id=109955#c96">Comment # 96</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - amdgpu [RX Vega 64] system freeze while gaming"
href="https://bugs.freedesktop.org/show_bug.cgi?id=109955">bug 109955</a>
from <span class="vcard"><a class="email" href="mailto:rodamorris@gmail.com" title="Rodney A Morris <rodamorris@gmail.com>"> <span class="fn">Rodney A Morris</span></a>
</span></b>
<pre>(In reply to Mauro Gaspari from <a href="show_bug.cgi?id=109955#c90">comment #90</a>)
I am experiencing periodic lockups with various games, including Hearts of Iron
IV, BATTLETECH, and Stellaris all being played through Steam. Below is the
most recent crash from playing less than 5 minutes of Hearts of Iron IV.
<span class="quote">>
> OS Info can be taken from neofetch:
> System info:</span >
/:-------------:\
:-------------------:: --------------------------------
:-----------/shhOHbmp---:\ OS: Fedora release 30 (Thirty) x86_64
/-----------omMMMNNNMMD ---: Kernel: 5.2.11-200.fc30.x86_64+debug
:-----------sMMMMNMNMP. ---: Uptime: 11 mins
:-----------:MMMdP------- ---\ Packages: 2198 (rpm), 27 (flatpak)
,------------:MMMd-------- ---: Shell: bash 5.0.7
:------------:MMMd------- .---: Resolution: 2560x1440
:---- oNMMMMMMMMMNho .----: DE: GNOME 3.32.2
:-- .+shhhMMMmhhy++ .------/ WM: GNOME Shell
:- -------:MMMd--------------: WM Theme: Adwaita
:- --------/MMMd-------------; Theme: Adapta-Nokto-Eta [GTK2/3]
:- ------/hMMMy------------: Icons: Adwaita [GTK2/3]
:-- :dMNdhhdNMMNo------------; Terminal: tilix
:---:sdNMMMMNds:------------: CPU: Intel i7-6850K (12) @ 4.000GHz
:------:://:-------------:: GPU: AMD ATI Radeon RX Vega 56/64
:---------------------:// Memory: 1666MiB / 32045MiB
<span class="quote">>
> Mesa info can be taken from this command:
> glxinfo | grep "OpenGL version" </span >
OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.1.5
<span class="quote">>
> Game being played: </span >
Hearts of Iron IV through Steam for Linux
<span class="quote">> Native or Wine or Wine+DXVK:</span >
Native
<span class="quote">>
> Crash type: Game crash? Full System freeze? System freeze but still can drop
> to tty?</span >
Screen goes black suddenly while music continues plays for less than a minute;
music begins to loop; and computer reboots.
<span class="quote">>
> DMESG output after the crash:
> sudo dmesg | grep amdgpu</span >
Here is the pertinent part dmesg with kernel debugging turned on. Some of the
information the crash would not be captured by grepping amdgpu. Entire dmesg
provided as an attachment.
[46957.810300] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out or interrupted!
[46962.941366] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=2446766, emitted seq=2446767
[46962.941453] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process hoi4 pid 24014 thread hoi4:cs0 pid 24015
[46962.941459] amdgpu 0000:06:00.0: GPU reset begin!
[46962.942698] ======================================================
[46962.942700] WARNING: possible circular locking dependency detected
[46962.942702] 5.2.11-200.fc30.x86_64+debug #1 Not tainted
[46962.942704] ------------------------------------------------------
[46962.942705] kworker/3:0/20416 is trying to acquire lock:
[46962.942708] 00000000a4a3593f (&(&ring->fence_drv.lock)->rlock){-.-.}, at:
dma_fence_remove_callback+0x1a/0x60
[46962.942717]
but task is already holding lock:
[46962.942718] 00000000d45cbf2b (&(&sched->job_list_lock)->rlock){-.-.}, at:
drm_sched_stop+0x34/0x130 [gpu_sched]
[46962.942724]
which lock already depends on the new lock.
[46962.942725]
the existing dependency chain (in reverse order) is:
[46962.942727]
-> #1 (&(&sched->job_list_lock)->rlock){-.-.}:
[46962.942735] _raw_spin_lock_irqsave+0x49/0x83
[46962.942738] drm_sched_process_job+0x4d/0x180 [gpu_sched]
[46962.942741] dma_fence_signal+0x111/0x1a0
[46962.942794] amdgpu_fence_process+0xa3/0x100 [amdgpu]
[46962.942858] sdma_v4_0_process_trap_irq+0x8d/0xa0 [amdgpu]
[46962.942918] amdgpu_irq_dispatch+0xc0/0x250 [amdgpu]
[46962.942978] amdgpu_ih_process+0x8d/0x110 [amdgpu]
[46962.943038] amdgpu_irq_handler+0x1b/0x50 [amdgpu]
[46962.943043] __handle_irq_event_percpu+0x3f/0x290
[46962.943046] handle_irq_event_percpu+0x31/0x80
[46962.943048] handle_irq_event+0x34/0x51
[46962.943053] handle_edge_irq+0x83/0x1a0
[46962.943057] handle_irq+0x1c/0x30
[46962.943059] do_IRQ+0x61/0x120
[46962.943063] ret_from_intr+0x0/0x22
[46962.943067] cpuidle_enter_state+0xc9/0x450
[46962.943069] cpuidle_enter+0x29/0x40
[46962.943074] do_idle+0x1ec/0x280
[46962.943076] cpu_startup_entry+0x19/0x20
[46962.943079] start_secondary+0x189/0x1e0
[46962.943083] secondary_startup_64+0xa4/0xb0
[46962.943087]
-> #0 (&(&ring->fence_drv.lock)->rlock){-.-.}:
[46962.943095] lock_acquire+0xa2/0x1b0
[46962.943105] _raw_spin_lock_irqsave+0x49/0x83
[46962.943109] dma_fence_remove_callback+0x1a/0x60
[46962.943114] drm_sched_stop+0x59/0x130 [gpu_sched]
[46962.943225] amdgpu_device_pre_asic_reset+0x41/0x20c [amdgpu]
[46962.943338] amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
[46962.943413] amdgpu_job_timedout+0x109/0x130 [amdgpu]
[46962.943418] drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[46962.943421] process_one_work+0x272/0x5e0
[46962.943423] worker_thread+0x50/0x3b0
[46962.943427] kthread+0x108/0x140
[46962.943431] ret_from_fork+0x3a/0x50
[46962.943432]
other info that might help us debug this:
[46962.943435] Possible unsafe locking scenario:
[46962.943437] CPU0 CPU1
[46962.943438] ---- ----
[46962.943439] lock(&(&sched->job_list_lock)->rlock);
[46962.943441]
lock(&(&ring->fence_drv.lock)->rlock);
[46962.943443]
lock(&(&sched->job_list_lock)->rlock);
[46962.943445] lock(&(&ring->fence_drv.lock)->rlock);
[46962.943447]
*** DEADLOCK ***
[46962.943449] 5 locks held by kworker/3:0/20416:
[46962.943450] #0: 0000000043c92b99 ((wq_completion)events){+.+.}, at:
process_one_work+0x1e9/0x5e0
[46962.943456] #1: 000000000c360f0c
((work_completion)(&(&sched->work_tdr)->work)){+.+.}, at:
process_one_work+0x1e9/0x5e0
[46962.943459] #2: 000000007a135814 (&adev->lock_reset){+.+.}, at:
amdgpu_device_lock_adev+0x17/0x39 [amdgpu]
[46962.943543] #3: 00000000e83f7d6b (&dqm->lock_hidden){+.+.}, at:
kgd2kfd_pre_reset+0x30/0x60 [amdgpu]
[46962.943614] #4: 00000000d45cbf2b (&(&sched->job_list_lock)->rlock){-.-.},
at: drm_sched_stop+0x34/0x130 [gpu_sched]
[46962.943620]
stack backtrace:
[46962.943629] CPU: 3 PID: 20416 Comm: kworker/3:0 Not tainted
5.2.11-200.fc30.x86_64+debug #1
[46962.943631] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99
Taichi, BIOS P1.80 04/06/2018
[46962.943636] Workqueue: events drm_sched_job_timedout [gpu_sched]
[46962.943638] Call Trace:
[46962.943648] dump_stack+0x85/0xc0
[46962.943654] print_circular_bug.cold+0x15c/0x195
[46962.943658] __lock_acquire+0x167c/0x1c90
[46962.943664] lock_acquire+0xa2/0x1b0
[46962.943668] ? dma_fence_remove_callback+0x1a/0x60
[46962.943674] _raw_spin_lock_irqsave+0x49/0x83
[46962.943677] ? dma_fence_remove_callback+0x1a/0x60
[46962.943680] dma_fence_remove_callback+0x1a/0x60
[46962.943684] drm_sched_stop+0x59/0x130 [gpu_sched]
[46962.943764] amdgpu_device_pre_asic_reset+0x41/0x20c [amdgpu]
[46962.943847] amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
[46962.943923] amdgpu_job_timedout+0x109/0x130 [amdgpu]
[46962.943930] drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[46962.943934] process_one_work+0x272/0x5e0
[46962.943938] worker_thread+0x50/0x3b0
[46962.943942] kthread+0x108/0x140
[46962.943945] ? process_one_work+0x5e0/0x5e0
[46962.943948] ? kthread_park+0x80/0x80
[46962.943952] ret_from_fork+0x3a/0x50
[46962.961034] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[46962.961044] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[46962.961048] pcieport 0000:00:03.0: AER: device [8086:6f08] error
status/mask=00004000/00000000
[46962.961051] pcieport 0000:00:03.0: AER: [14] CmpltTO
(First)
[46962.961149] pcieport 0000:00:03.0: AER: Device recovery failed
[46963.955209] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring page1 timeout,
signaled seq=95391072, emitted seq=95391072
[46963.955328] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process pid 0 thread pid 0
[46963.955336] amdgpu 0000:06:00.0: GPU reset begin!
[46968.050083] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
*ERROR* [CRTC:47:crtc-0] flip_done timed out
[46973.170223] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:47:crtc-0]
hw_done or flip_done timed out
[46983.410080] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [CRTC:47:crtc-0] flip_done timed out
[46993.650098] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [PLANE:45:plane-5] flip_done timed out
[46993.962192] amdgpu: [powerplay] No response from smu
[46993.962195] amdgpu: [powerplay] Failed message: 0xe, input parameter: 0x0,
error code: 0x0
[46994.277773] amdgpu: [powerplay] No response from smu
[46994.593416] amdgpu: [powerplay] No response from smu
[46994.593420] amdgpu: [powerplay] Failed message: 0x42, input parameter: 0x1,
error code: 0x0
[46994.908354] amdgpu: [powerplay] No response from smu
[46995.223718] amdgpu: [powerplay] No response from smu
[46995.223722] amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0,
error code: 0x0
[46995.286504] [drm] REG_WAIT timeout 10us * 3500 tries - dce_mi_free_dmif
line:634
[46995.286506] ------------[ cut here ]------------
[46995.286605] WARNING: CPU: 3 PID: 20416 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:329
generic_reg_wait.cold+0x31/0x53 [amdgpu]
[46995.286606] Modules linked in: vhost_net vhost tap rfcomm xt_CHECKSUM
xt_MASQUERADE tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast
xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4
xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw
iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set
nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
ip_tables bnep nct6775 hwmon_vid intel_rapl vfat fat arc4 x86_pkg_temp_thermal
intel_powerclamp coretemp fuse kvm_intel kvm iwlmvm irqbypass iTCO_wdt
iTCO_vendor_support mac80211 crct10dif_pclmul crc32_pclmul
snd_hda_codec_realtek ghash_clmulni_intel intel_cstate snd_hda_codec_generic
iwlwifi snd_hda_codec_hdmi ledtrig_audio intel_uncore snd_hda_intel
intel_rapl_perf cfg80211 snd_hda_codec btusb mxm_wmi snd_hda_core btrtl btbcm
snd_hwdep btintel snd_seq i2c_i801 lpc_ich bluetooth
[46995.286626] snd_seq_device joydev snd_pcm ecdh_generic snd_timer rfkill ecc
mei_me snd mei soundcore pcc_cpufreq binfmt_misc auth_rpcgss sunrpc amdgpu
amd_iommu_v2 gpu_sched ttm drm_kms_helper crc32c_intel igb uas drm usb_storage
dca mpt3sas i2c_algo_bit e1000e nvme raid_class nvme_core scsi_transport_sas
wmi
[46995.286638] CPU: 3 PID: 20416 Comm: kworker/3:0 Not tainted
5.2.11-200.fc30.x86_64+debug #1
[46995.286639] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99
Taichi, BIOS P1.80 04/06/2018
[46995.286643] Workqueue: events drm_sched_job_timedout [gpu_sched]
[46995.286682] RIP: 0010:generic_reg_wait.cold+0x31/0x53 [amdgpu]
[46995.286684] Code: 4c 24 18 44 89 fa 89 ee 48 c7 c7 78 93 80 c0 e8 45 fd a0
ca 83 7b 20 01 0f 84 27 11 fe ff 48 c7 c7 70 92 80 c0 e8 2f fd a0 ca <0f> 0b e9
14 11 fe ff 48 c7 c7 70 92 80 c0 89 54 24 04 e8 18 fd a0
[46995.286685] RSP: 0018:ffff9cd009b3f728 EFLAGS: 00010246
[46995.286687] RAX: 0000000000000024 RBX: ffff8ada6be8a780 RCX:
0000000000000006
[46995.286688] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff8ada7ebd9c80
[46995.286689] RBP: 000000000000000a R08: 0000000000000001 R09:
0000000000000000
[46995.286690] R10: 0000000000000000 R11: 0000000000000000 R12:
00000000000035af
[46995.286691] R13: 0000000000000dad R14: 0000000000000001 R15:
0000000000000dac
[46995.286692] FS: 0000000000000000(0000) GS:ffff8ada7ea00000(0000)
knlGS:0000000000000000
[46995.286694] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[46995.286695] CR2: 0000085777c78000 CR3: 00000003cb612005 CR4:
00000000003606e0
[46995.286696] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[46995.286697] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[46995.286698] Call Trace:
[46995.286741] dce_mi_free_dmif+0xef/0x150 [amdgpu]
[46995.286780] dce110_reset_hw_ctx_wrap+0x14a/0x1e0 [amdgpu]
[46995.286819] dce110_apply_ctx_to_hw+0x4a/0x490 [amdgpu]
[46995.286843] ? amdgpu_pm_compute_clocks.part.0+0xcb/0x610 [amdgpu]
[46995.286882] ? dm_pp_apply_display_requirements+0x19e/0x1c0 [amdgpu]
[46995.286920] dc_commit_state+0x262/0x580 [amdgpu]
[46995.286925] ? vsnprintf+0x3aa/0x4f0
[46995.286965] amdgpu_dm_atomic_commit_tail+0xc34/0x1970 [amdgpu]
[46995.286971] ? console_unlock+0x363/0x5d0
[46995.286976] ? __irq_work_queue_local+0x50/0x60
[46995.286977] ? irq_work_queue+0x4d/0x60
[46995.286979] ? wake_up_klogd+0x37/0x40
[46995.286984] ? wait_for_completion_timeout+0x4c/0x190
[46995.286987] ? _raw_spin_unlock_irq+0x29/0x40
[46995.286989] ? wait_for_completion_timeout+0x75/0x190
[46995.287016] ? commit_tail+0x3c/0x70 [drm_kms_helper]
[46995.287021] commit_tail+0x3c/0x70 [drm_kms_helper]
[46995.287026] drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
[46995.287031] drm_atomic_helper_disable_all+0x14c/0x160 [drm_kms_helper]
[46995.287035] drm_atomic_helper_suspend+0x66/0x100 [drm_kms_helper]
[46995.287076] dm_suspend+0x20/0x60 [amdgpu]
[46995.287098] amdgpu_device_ip_suspend_phase1+0x91/0xc0 [amdgpu]
[46995.287123] amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
[46995.287164] amdgpu_device_pre_asic_reset+0x1f7/0x20c [amdgpu]
[46995.287204] amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
[46995.287242] amdgpu_job_timedout+0x109/0x130 [amdgpu]
[46995.287246] drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[46995.287249] process_one_work+0x272/0x5e0
[46995.287252] worker_thread+0x50/0x3b0
[46995.287256] kthread+0x108/0x140
[46995.287258] ? process_one_work+0x5e0/0x5e0
[46995.287260] ? kthread_park+0x80/0x80
[46995.287263] ret_from_fork+0x3a/0x50
[46995.287267] irq event stamp: 6288284
[46995.287269] hardirqs last enabled at (6288283): [<ffffffff8bb04d8b>]
_raw_spin_unlock_irqrestore+0x4b/0x60
[46995.287271] hardirqs last disabled at (6288284): [<ffffffff8bb05533>]
_raw_spin_lock_irqsave+0x23/0x83
[46995.287273] softirqs last enabled at (6288276): [<ffffffff8be0035d>]
__do_softirq+0x35d/0x468
[46995.287276] softirqs last disabled at (6288269): [<ffffffff8b0f07a2>]
irq_exit+0x102/0x110
[46995.287277] ---[ end trace 6a2158c4cfef5172 ]---
[46995.603082] amdgpu: [powerplay] No response from smu
[46995.918767] amdgpu: [powerplay] No response from smu
[46995.918770] amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x1,
error code: 0x0
[46996.233769] amdgpu: [powerplay] No response from smu
[46996.549255] amdgpu: [powerplay] No response from smu
[46996.549258] amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x3,
error code: 0x0
[46996.865320] amdgpu: [powerplay] No response from smu
[46997.181203] amdgpu: [powerplay] No response from smu
[46997.181206] amdgpu: [powerplay] Failed message: 0x9, input parameter: 0xf4,
error code: 0x0
[46997.495804] amdgpu: [powerplay] No response from smu
[46997.811227] amdgpu: [powerplay] No response from smu
[46997.811231] amdgpu: [powerplay] Failed message: 0xa, input parameter:
0xa0b000, error code: 0x0
[46998.126794] amdgpu: [powerplay] No response from smu
[46998.442559] amdgpu: [powerplay] No response from smu
[46998.442561] amdgpu: [powerplay] Failed message: 0xe, input parameter: 0x0,
error code: 0x0
[46998.756884] amdgpu: [powerplay] No response from smu
[46999.072680] amdgpu: [powerplay] No response from smu
[46999.072684] amdgpu: [powerplay] Failed message: 0x4, input parameter: 0x400,
error code: 0x0
[46999.388310] amdgpu: [powerplay] No response from smu
[46999.704067] amdgpu: [powerplay] No response from smu
[46999.704069] amdgpu: [powerplay] Failed message: 0x42, input parameter: 0x1,
error code: 0x0
[47000.019626] amdgpu: [powerplay] No response from smu
[47000.334247] amdgpu: [powerplay] No response from smu
[47000.334251] amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0,
error code: 0x0
[47000.350026] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.350043] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.350052] pcieport 0000:00:03.0: AER: device [8086:6f08] error
status/mask=00004000/00000000
[47000.350061] pcieport 0000:00:03.0: AER: [14] CmpltTO
(First)
[47000.350202] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.367437] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.367443] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.367444] pcieport 0000:00:03.0: AER: device [8086:6f08] error
status/mask=00004000/00000000
[47000.367446] pcieport 0000:00:03.0: AER: [14] CmpltTO
(First)
[47000.367486] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.384977] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.384982] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.384983] pcieport 0000:00:03.0: AER: device [8086:6f08] error
status/mask=00004000/00000000
[47000.384985] pcieport 0000:00:03.0: AER: [14] CmpltTO
(First)
[47000.385055] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.402521] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.402530] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.402532] pcieport 0000:00:03.0: AER: device [8086:6f08] error
status/mask=00004000/00000000
[47000.402535] pcieport 0000:00:03.0: AER: [14] CmpltTO
(First)
[47000.402578] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.420068] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.420079] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.420085] pcieport 0000:00:03.0: AER: device [8086:6f08] error
status/mask=00004000/00000000
[47000.420090] pcieport 0000:00:03.0: AER: [14] CmpltTO
(First)
[47000.420186] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.437608] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.437617] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.437621] pcieport 0000:00:03.0: AER: device [8086:6f08] error
status/mask=00004000/00000000
[47000.437625] pcieport 0000:00:03.0: AER: [14] CmpltTO
(First)
[47000.437726] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.455143] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.455151] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.455154] pcieport 0000:00:03.0: AER: device [8086:6f08] error
status/mask=00004000/00000000
[47000.455157] pcieport 0000:00:03.0: AER: [14] CmpltTO
(First)
[47000.455209] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.472688] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.472698] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.472703] pcieport 0000:00:03.0: AER: device [8086:6f08] error
status/mask=00004000/00000000
[47000.472708] pcieport 0000:00:03.0: AER: [14] CmpltTO
(First)
[47000.472826] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.490225] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.490232] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.490236] pcieport 0000:00:03.0: AER: device [8086:6f08] error
status/mask=00004000/00000000
[47000.490239] pcieport 0000:00:03.0: AER: [14] CmpltTO
(First)
[47000.490289] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.507760] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal)
error received: 0000:00:03.0
[47000.735787] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.735791] pcieport 0000:00:03.0: AER: device [8086:6f08] error
status/mask=00004000/00000000
[47000.735793] pcieport 0000:00:03.0: AER: [14] CmpltTO
(First)
[47000.735824] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.735826] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal)
error received: 0000:00:03.0
<span class="quote">> systemd logs output after the crash (If your system froze and you get logs
> after reboot):</span >
Sep 06 08:36:58 ezra.blanchardmorris.net kernel: Command line:
BOOT_IMAGE=(hd4,gpt6)/vmlinuz-5.2.11-200.fc30.x86_64+debug
root=UUID=e7b8b34a-e17f-4c2b-b223-eaa636249d2d ro
resume=UUID=52cc8cd8-b06f-4613-8781-a105d0ebf44a rhgb quiet amdgpu.vm_debug=1
Sep 06 08:36:58 ezra.blanchardmorris.net kernel: Kernel command line:
BOOT_IMAGE=(hd4,gpt6)/vmlinuz-5.2.11-200.fc30.x86_64+debug
root=UUID=e7b8b34a-e17f-4c2b-b223-eaa636249d2d ro
resume=UUID=52cc8cd8-b06f-4613-8781-a105d0ebf44a rhgb quiet amdgpu.vm_debug=1
Sep 06 08:36:59 ezra.blanchardmorris.net dracut-cmdline[361]: Using kernel
command line parameters:
BOOT_IMAGE=(hd4,gpt6)/vmlinuz-5.2.11-200.fc30.x86_64+debug
root=UUID=e7b8b34a-e17f-4c2b-b223-eaa636249d2d ro
resume=UUID=52cc8cd8-b06f-4613-8781-a105d0ebf44a rhgb quiet amdgpu.vm_debug=1
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: [drm] amdgpu kernel
modesetting enabled.
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0:
remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0:
remove_conflicting_pci_framebuffers: bar 2: 0xf0000000 -> 0xf01fffff
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0:
remove_conflicting_pci_framebuffers: bar 5: 0xfb600000 -> 0xfb67ffff
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: fb0: switching to amdgpudrmfb
from EFI VGA
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: vgaarb:
deactivate vga console
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: No more
image in the PCI ROM
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: VRAM:
8176M 0x000000F400000000 - 0x000000F5FEFFFFFF (8176M used)
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: GART:
512M 0x0000000000000000 - 0x000000001FFFFFFF
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: AGP:
267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: [drm] amdgpu: 8176M of VRAM
memory ready
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: [drm] amdgpu: 8176M of GTT
memory ready.
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: fbcon: amdgpudrmfb (fb0) is
primary device
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: fb0:
amdgpudrmfb frame buffer device
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring gfx
uses VM inv eng 0 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.0.0 uses VM inv eng 1 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.1.0 uses VM inv eng 4 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.2.0 uses VM inv eng 5 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.3.0 uses VM inv eng 6 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.0.1 uses VM inv eng 7 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.1.1 uses VM inv eng 8 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.2.1 uses VM inv eng 9 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.3.1 uses VM inv eng 10 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
kiq_2.1.0 uses VM inv eng 11 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
sdma0 uses VM inv eng 0 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
page0 uses VM inv eng 1 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
sdma1 uses VM inv eng 4 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
page1 uses VM inv eng 5 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
uvd_0 uses VM inv eng 6 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
uvd_enc_0.0 uses VM inv eng 7 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
uvd_enc_0.1 uses VM inv eng 8 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring vce0
uses VM inv eng 9 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring vce1
uses VM inv eng 10 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring vce2
uses VM inv eng 11 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: [drm] Initialized amdgpu
3.32.0 20150101 for 0000:06:00.0 on minor 0
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]:
Kernel command line: BOOT_IMAGE=(hd4,gpt6)/vmlinuz-5.2.11-200.fc30.x86_64+debug
root=UUID=e7b8b34a-e17f-4c2b-b223-eaa636249d2d ro
resume=UUID=52cc8cd8-b06f-4613-8781-a105d0ebf44a rhgb quiet amdgpu.vm_debug=1
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]:
loading driver: amdgpu
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]: (==)
Matched amdgpu as autoconfigured driver 0
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]: (II)
LoadModule: "amdgpu"
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]: (II)
Loading /usr/lib64/xorg/modules/drivers/amdgpu_drv.so
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]: (II)
Module amdgpu: vendor="X.Org Foundation"
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]:
All GPUs supported by the amdgpu kernel driver
Sep 06 16:13:18 ezra.blanchardmorris.net net.lutris.Lutris.desktop[2234]:
2019-09-06 16:13:18,530: GPU: 1002:687F 1002:0B36 using amdgpu drivers
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:
[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed
out or interrupted!
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring gfx timeout, signaled seq=2446766, emitted seq=2446767
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process hoi4 pid 24014 thread hoi4:cs0
pid 24015
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: GPU reset
begin!
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:
amdgpu_fence_process+0xa3/0x100 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:
sdma_v4_0_process_trap_irq+0x8d/0xa0 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:
amdgpu_irq_dispatch+0xc0/0x250 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:
amdgpu_ih_process+0x8d/0x110 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:
amdgpu_irq_handler+0x1b/0x50 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:
amdgpu_device_pre_asic_reset+0x41/0x20c [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:
amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:
amdgpu_job_timedout+0x109/0x130 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: #2: 000000007a135814
(&adev->lock_reset){+.+.}, at: amdgpu_device_lock_adev+0x17/0x39 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: #3: 00000000e83f7d6b
(&dqm->lock_hidden){+.+.}, at: kgd2kfd_pre_reset+0x30/0x60 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:
amdgpu_device_pre_asic_reset+0x41/0x20c [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:
amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:
amdgpu_job_timedout+0x109/0x130 [amdgpu]
Sep 06 21:39:40 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring page1 timeout, signaled seq=95391072, emitted
seq=95391072
Sep 06 21:39:40 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Sep 06 21:39:40 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: GPU reset
begin!
Sep 06 21:39:49 ezra.blanchardmorris.net kernel: [drm:amdgpu_dm_atomic_check
[amdgpu]] *ERROR* [CRTC:47:crtc-0] hw_done or flip_done timed out
Sep 06 21:40:10 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Sep 06 21:40:10 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0xe, input parameter: 0x0, error code: 0x0
Sep 06 21:40:10 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Sep 06 21:40:10 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Sep 06 21:40:10 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0x42, input parameter: 0x1, error code: 0x0
Sep 06 21:40:11 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
I will try to run apitrace on Hearts of Iron IV to try to capture more
information. Please let me know if I can be of further assistance in squashing
this annoying bug, like providing crash information with the mesa debug
packages installed.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>