<html>
    <head>
      <base href="https://bugs.freedesktop.org/">
    </head>
    <body>
      <p>
        <div>
            <b><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - amdgpu [RX Vega 64] system freeze while gaming"
   href="https://bugs.freedesktop.org/show_bug.cgi?id=109955#c96">Comment # 96</a>
              on <a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - amdgpu [RX Vega 64] system freeze while gaming"
   href="https://bugs.freedesktop.org/show_bug.cgi?id=109955">bug 109955</a>
              from <span class="vcard"><a class="email" href="mailto:rodamorris@gmail.com" title="Rodney A Morris <rodamorris@gmail.com>"> <span class="fn">Rodney A Morris</span></a>
</span></b>
        <pre>(In reply to Mauro Gaspari from <a href="show_bug.cgi?id=109955#c90">comment #90</a>)

I am experiencing periodic lockups with various games, including Hearts of Iron
IV, BATTLETECH, and Stellaris all being played through Steam.  Below is the
most recent crash from playing less than 5 minutes of Hearts of Iron IV.



<span class="quote">> 
> OS Info can be taken from neofetch:
> System info:</span >

           /:-------------:\          
       :-------------------::        -------------------------------- 
     :-----------/shhOHbmp---:\      OS: Fedora release 30 (Thirty) x86_64 
   /-----------omMMMNNNMMD  ---:     Kernel: 5.2.11-200.fc30.x86_64+debug 
  :-----------sMMMMNMNMP.    ---:    Uptime: 11 mins 
 :-----------:MMMdP-------    ---\   Packages: 2198 (rpm), 27 (flatpak) 
,------------:MMMd--------    ---:   Shell: bash 5.0.7 
:------------:MMMd-------    .---:   Resolution: 2560x1440 
:----    oNMMMMMMMMMNho     .----:   DE: GNOME 3.32.2 
:--     .+shhhMMMmhhy++   .------/   WM: GNOME Shell 
:-    -------:MMMd--------------:    WM Theme: Adwaita 
:-   --------/MMMd-------------;     Theme: Adapta-Nokto-Eta [GTK2/3] 
:-    ------/hMMMy------------:      Icons: Adwaita [GTK2/3] 
:-- :dMNdhhdNMMNo------------;       Terminal: tilix 
:---:sdNMMMMNds:------------:        CPU: Intel i7-6850K (12) @ 4.000GHz 
:------:://:-------------::          GPU: AMD ATI Radeon RX Vega 56/64 
:---------------------://            Memory: 1666MiB / 32045MiB 

<span class="quote">> 
> Mesa info can be taken from this command:
> glxinfo | grep "OpenGL version" </span >

OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.1.5

<span class="quote">> 
> Game being played: </span >

Hearts of Iron IV through Steam for Linux

<span class="quote">> Native or Wine or Wine+DXVK:</span >

Native

<span class="quote">> 
> Crash type: Game crash? Full System freeze? System freeze but still can drop
> to tty?</span >

Screen goes black suddenly while music continues plays for less than a minute;
music begins to loop; and computer reboots.

<span class="quote">> 
> DMESG output after the crash:
> sudo dmesg | grep amdgpu</span >

Here is the pertinent part dmesg with kernel debugging turned on.  Some of the
information the crash would not be captured by grepping amdgpu.  Entire dmesg
provided as an attachment.

[46957.810300] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out or interrupted!
[46962.941366] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=2446766, emitted seq=2446767
[46962.941453] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process hoi4 pid 24014 thread hoi4:cs0 pid 24015
[46962.941459] amdgpu 0000:06:00.0: GPU reset begin!

[46962.942698] ======================================================
[46962.942700] WARNING: possible circular locking dependency detected
[46962.942702] 5.2.11-200.fc30.x86_64+debug #1 Not tainted
[46962.942704] ------------------------------------------------------
[46962.942705] kworker/3:0/20416 is trying to acquire lock:
[46962.942708] 00000000a4a3593f (&(&ring->fence_drv.lock)->rlock){-.-.}, at:
dma_fence_remove_callback+0x1a/0x60
[46962.942717] 
               but task is already holding lock:
[46962.942718] 00000000d45cbf2b (&(&sched->job_list_lock)->rlock){-.-.}, at:
drm_sched_stop+0x34/0x130 [gpu_sched]
[46962.942724] 
               which lock already depends on the new lock.

[46962.942725] 
               the existing dependency chain (in reverse order) is:
[46962.942727] 
               -> #1 (&(&sched->job_list_lock)->rlock){-.-.}:
[46962.942735]        _raw_spin_lock_irqsave+0x49/0x83
[46962.942738]        drm_sched_process_job+0x4d/0x180 [gpu_sched]
[46962.942741]        dma_fence_signal+0x111/0x1a0
[46962.942794]        amdgpu_fence_process+0xa3/0x100 [amdgpu]
[46962.942858]        sdma_v4_0_process_trap_irq+0x8d/0xa0 [amdgpu]
[46962.942918]        amdgpu_irq_dispatch+0xc0/0x250 [amdgpu]
[46962.942978]        amdgpu_ih_process+0x8d/0x110 [amdgpu]
[46962.943038]        amdgpu_irq_handler+0x1b/0x50 [amdgpu]
[46962.943043]        __handle_irq_event_percpu+0x3f/0x290
[46962.943046]        handle_irq_event_percpu+0x31/0x80
[46962.943048]        handle_irq_event+0x34/0x51
[46962.943053]        handle_edge_irq+0x83/0x1a0
[46962.943057]        handle_irq+0x1c/0x30
[46962.943059]        do_IRQ+0x61/0x120
[46962.943063]        ret_from_intr+0x0/0x22
[46962.943067]        cpuidle_enter_state+0xc9/0x450
[46962.943069]        cpuidle_enter+0x29/0x40
[46962.943074]        do_idle+0x1ec/0x280
[46962.943076]        cpu_startup_entry+0x19/0x20
[46962.943079]        start_secondary+0x189/0x1e0
[46962.943083]        secondary_startup_64+0xa4/0xb0
[46962.943087] 
               -> #0 (&(&ring->fence_drv.lock)->rlock){-.-.}:
[46962.943095]        lock_acquire+0xa2/0x1b0
[46962.943105]        _raw_spin_lock_irqsave+0x49/0x83
[46962.943109]        dma_fence_remove_callback+0x1a/0x60
[46962.943114]        drm_sched_stop+0x59/0x130 [gpu_sched]
[46962.943225]        amdgpu_device_pre_asic_reset+0x41/0x20c [amdgpu]
[46962.943338]        amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
[46962.943413]        amdgpu_job_timedout+0x109/0x130 [amdgpu]
[46962.943418]        drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[46962.943421]        process_one_work+0x272/0x5e0
[46962.943423]        worker_thread+0x50/0x3b0
[46962.943427]        kthread+0x108/0x140
[46962.943431]        ret_from_fork+0x3a/0x50
[46962.943432] 
               other info that might help us debug this:

[46962.943435]  Possible unsafe locking scenario:

[46962.943437]        CPU0                    CPU1
[46962.943438]        ----                    ----
[46962.943439]   lock(&(&sched->job_list_lock)->rlock);
[46962.943441]                               
lock(&(&ring->fence_drv.lock)->rlock);
[46962.943443]                               
lock(&(&sched->job_list_lock)->rlock);
[46962.943445]   lock(&(&ring->fence_drv.lock)->rlock);
[46962.943447] 
                *** DEADLOCK ***

[46962.943449] 5 locks held by kworker/3:0/20416:
[46962.943450]  #0: 0000000043c92b99 ((wq_completion)events){+.+.}, at:
process_one_work+0x1e9/0x5e0
[46962.943456]  #1: 000000000c360f0c
((work_completion)(&(&sched->work_tdr)->work)){+.+.}, at:
process_one_work+0x1e9/0x5e0
[46962.943459]  #2: 000000007a135814 (&adev->lock_reset){+.+.}, at:
amdgpu_device_lock_adev+0x17/0x39 [amdgpu]
[46962.943543]  #3: 00000000e83f7d6b (&dqm->lock_hidden){+.+.}, at:
kgd2kfd_pre_reset+0x30/0x60 [amdgpu]
[46962.943614]  #4: 00000000d45cbf2b (&(&sched->job_list_lock)->rlock){-.-.},
at: drm_sched_stop+0x34/0x130 [gpu_sched]
[46962.943620] 
               stack backtrace:
[46962.943629] CPU: 3 PID: 20416 Comm: kworker/3:0 Not tainted
5.2.11-200.fc30.x86_64+debug #1
[46962.943631] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99
Taichi, BIOS P1.80 04/06/2018
[46962.943636] Workqueue: events drm_sched_job_timedout [gpu_sched]
[46962.943638] Call Trace:
[46962.943648]  dump_stack+0x85/0xc0
[46962.943654]  print_circular_bug.cold+0x15c/0x195
[46962.943658]  __lock_acquire+0x167c/0x1c90
[46962.943664]  lock_acquire+0xa2/0x1b0
[46962.943668]  ? dma_fence_remove_callback+0x1a/0x60
[46962.943674]  _raw_spin_lock_irqsave+0x49/0x83
[46962.943677]  ? dma_fence_remove_callback+0x1a/0x60
[46962.943680]  dma_fence_remove_callback+0x1a/0x60
[46962.943684]  drm_sched_stop+0x59/0x130 [gpu_sched]
[46962.943764]  amdgpu_device_pre_asic_reset+0x41/0x20c [amdgpu]
[46962.943847]  amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
[46962.943923]  amdgpu_job_timedout+0x109/0x130 [amdgpu]
[46962.943930]  drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[46962.943934]  process_one_work+0x272/0x5e0
[46962.943938]  worker_thread+0x50/0x3b0
[46962.943942]  kthread+0x108/0x140
[46962.943945]  ? process_one_work+0x5e0/0x5e0
[46962.943948]  ? kthread_park+0x80/0x80
[46962.943952]  ret_from_fork+0x3a/0x50
[46962.961034] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[46962.961044] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[46962.961048] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[46962.961051] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[46962.961149] pcieport 0000:00:03.0: AER: Device recovery failed
[46963.955209] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring page1 timeout,
signaled seq=95391072, emitted seq=95391072
[46963.955328] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process  pid 0 thread  pid 0
[46963.955336] amdgpu 0000:06:00.0: GPU reset begin!
[46968.050083] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
*ERROR* [CRTC:47:crtc-0] flip_done timed out
[46973.170223] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:47:crtc-0]
hw_done or flip_done timed out
[46983.410080] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [CRTC:47:crtc-0] flip_done timed out
[46993.650098] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [PLANE:45:plane-5] flip_done timed out
[46993.962192] amdgpu: [powerplay] No response from smu
[46993.962195] amdgpu: [powerplay] Failed message: 0xe, input parameter: 0x0,
error code: 0x0
[46994.277773] amdgpu: [powerplay] No response from smu
[46994.593416] amdgpu: [powerplay] No response from smu
[46994.593420] amdgpu: [powerplay] Failed message: 0x42, input parameter: 0x1,
error code: 0x0
[46994.908354] amdgpu: [powerplay] No response from smu
[46995.223718] amdgpu: [powerplay] No response from smu
[46995.223722] amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0,
error code: 0x0
[46995.286504] [drm] REG_WAIT timeout 10us * 3500 tries - dce_mi_free_dmif
line:634
[46995.286506] ------------[ cut here ]------------
[46995.286605] WARNING: CPU: 3 PID: 20416 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:329
generic_reg_wait.cold+0x31/0x53 [amdgpu]
[46995.286606] Modules linked in: vhost_net vhost tap rfcomm xt_CHECKSUM
xt_MASQUERADE tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast
xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4
xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw
iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set
nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
ip_tables bnep nct6775 hwmon_vid intel_rapl vfat fat arc4 x86_pkg_temp_thermal
intel_powerclamp coretemp fuse kvm_intel kvm iwlmvm irqbypass iTCO_wdt
iTCO_vendor_support mac80211 crct10dif_pclmul crc32_pclmul
snd_hda_codec_realtek ghash_clmulni_intel intel_cstate snd_hda_codec_generic
iwlwifi snd_hda_codec_hdmi ledtrig_audio intel_uncore snd_hda_intel
intel_rapl_perf cfg80211 snd_hda_codec btusb mxm_wmi snd_hda_core btrtl btbcm
snd_hwdep btintel snd_seq i2c_i801 lpc_ich bluetooth
[46995.286626]  snd_seq_device joydev snd_pcm ecdh_generic snd_timer rfkill ecc
mei_me snd mei soundcore pcc_cpufreq binfmt_misc auth_rpcgss sunrpc amdgpu
amd_iommu_v2 gpu_sched ttm drm_kms_helper crc32c_intel igb uas drm usb_storage
dca mpt3sas i2c_algo_bit e1000e nvme raid_class nvme_core scsi_transport_sas
wmi
[46995.286638] CPU: 3 PID: 20416 Comm: kworker/3:0 Not tainted
5.2.11-200.fc30.x86_64+debug #1
[46995.286639] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99
Taichi, BIOS P1.80 04/06/2018
[46995.286643] Workqueue: events drm_sched_job_timedout [gpu_sched]
[46995.286682] RIP: 0010:generic_reg_wait.cold+0x31/0x53 [amdgpu]
[46995.286684] Code: 4c 24 18 44 89 fa 89 ee 48 c7 c7 78 93 80 c0 e8 45 fd a0
ca 83 7b 20 01 0f 84 27 11 fe ff 48 c7 c7 70 92 80 c0 e8 2f fd a0 ca <0f> 0b e9
14 11 fe ff 48 c7 c7 70 92 80 c0 89 54 24 04 e8 18 fd a0
[46995.286685] RSP: 0018:ffff9cd009b3f728 EFLAGS: 00010246
[46995.286687] RAX: 0000000000000024 RBX: ffff8ada6be8a780 RCX:
0000000000000006
[46995.286688] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff8ada7ebd9c80
[46995.286689] RBP: 000000000000000a R08: 0000000000000001 R09:
0000000000000000
[46995.286690] R10: 0000000000000000 R11: 0000000000000000 R12:
00000000000035af
[46995.286691] R13: 0000000000000dad R14: 0000000000000001 R15:
0000000000000dac
[46995.286692] FS:  0000000000000000(0000) GS:ffff8ada7ea00000(0000)
knlGS:0000000000000000
[46995.286694] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[46995.286695] CR2: 0000085777c78000 CR3: 00000003cb612005 CR4:
00000000003606e0
[46995.286696] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[46995.286697] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[46995.286698] Call Trace:
[46995.286741]  dce_mi_free_dmif+0xef/0x150 [amdgpu]
[46995.286780]  dce110_reset_hw_ctx_wrap+0x14a/0x1e0 [amdgpu]
[46995.286819]  dce110_apply_ctx_to_hw+0x4a/0x490 [amdgpu]
[46995.286843]  ? amdgpu_pm_compute_clocks.part.0+0xcb/0x610 [amdgpu]
[46995.286882]  ? dm_pp_apply_display_requirements+0x19e/0x1c0 [amdgpu]
[46995.286920]  dc_commit_state+0x262/0x580 [amdgpu]
[46995.286925]  ? vsnprintf+0x3aa/0x4f0
[46995.286965]  amdgpu_dm_atomic_commit_tail+0xc34/0x1970 [amdgpu]
[46995.286971]  ? console_unlock+0x363/0x5d0
[46995.286976]  ? __irq_work_queue_local+0x50/0x60
[46995.286977]  ? irq_work_queue+0x4d/0x60
[46995.286979]  ? wake_up_klogd+0x37/0x40
[46995.286984]  ? wait_for_completion_timeout+0x4c/0x190
[46995.286987]  ? _raw_spin_unlock_irq+0x29/0x40
[46995.286989]  ? wait_for_completion_timeout+0x75/0x190
[46995.287016]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
[46995.287021]  commit_tail+0x3c/0x70 [drm_kms_helper]
[46995.287026]  drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
[46995.287031]  drm_atomic_helper_disable_all+0x14c/0x160 [drm_kms_helper]
[46995.287035]  drm_atomic_helper_suspend+0x66/0x100 [drm_kms_helper]
[46995.287076]  dm_suspend+0x20/0x60 [amdgpu]
[46995.287098]  amdgpu_device_ip_suspend_phase1+0x91/0xc0 [amdgpu]
[46995.287123]  amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
[46995.287164]  amdgpu_device_pre_asic_reset+0x1f7/0x20c [amdgpu]
[46995.287204]  amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
[46995.287242]  amdgpu_job_timedout+0x109/0x130 [amdgpu]
[46995.287246]  drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[46995.287249]  process_one_work+0x272/0x5e0
[46995.287252]  worker_thread+0x50/0x3b0
[46995.287256]  kthread+0x108/0x140
[46995.287258]  ? process_one_work+0x5e0/0x5e0
[46995.287260]  ? kthread_park+0x80/0x80
[46995.287263]  ret_from_fork+0x3a/0x50
[46995.287267] irq event stamp: 6288284
[46995.287269] hardirqs last  enabled at (6288283): [<ffffffff8bb04d8b>]
_raw_spin_unlock_irqrestore+0x4b/0x60
[46995.287271] hardirqs last disabled at (6288284): [<ffffffff8bb05533>]
_raw_spin_lock_irqsave+0x23/0x83
[46995.287273] softirqs last  enabled at (6288276): [<ffffffff8be0035d>]
__do_softirq+0x35d/0x468
[46995.287276] softirqs last disabled at (6288269): [<ffffffff8b0f07a2>]
irq_exit+0x102/0x110
[46995.287277] ---[ end trace 6a2158c4cfef5172 ]---
[46995.603082] amdgpu: [powerplay] No response from smu
[46995.918767] amdgpu: [powerplay] No response from smu
[46995.918770] amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x1,
error code: 0x0
[46996.233769] amdgpu: [powerplay] No response from smu
[46996.549255] amdgpu: [powerplay] No response from smu
[46996.549258] amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x3,
error code: 0x0
[46996.865320] amdgpu: [powerplay] No response from smu
[46997.181203] amdgpu: [powerplay] No response from smu
[46997.181206] amdgpu: [powerplay] Failed message: 0x9, input parameter: 0xf4,
error code: 0x0
[46997.495804] amdgpu: [powerplay] No response from smu
[46997.811227] amdgpu: [powerplay] No response from smu
[46997.811231] amdgpu: [powerplay] Failed message: 0xa, input parameter:
0xa0b000, error code: 0x0
[46998.126794] amdgpu: [powerplay] No response from smu
[46998.442559] amdgpu: [powerplay] No response from smu
[46998.442561] amdgpu: [powerplay] Failed message: 0xe, input parameter: 0x0,
error code: 0x0
[46998.756884] amdgpu: [powerplay] No response from smu
[46999.072680] amdgpu: [powerplay] No response from smu
[46999.072684] amdgpu: [powerplay] Failed message: 0x4, input parameter: 0x400,
error code: 0x0
[46999.388310] amdgpu: [powerplay] No response from smu
[46999.704067] amdgpu: [powerplay] No response from smu
[46999.704069] amdgpu: [powerplay] Failed message: 0x42, input parameter: 0x1,
error code: 0x0
[47000.019626] amdgpu: [powerplay] No response from smu
[47000.334247] amdgpu: [powerplay] No response from smu
[47000.334251] amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0,
error code: 0x0
[47000.350026] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.350043] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.350052] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.350061] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.350202] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.367437] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.367443] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.367444] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.367446] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.367486] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.384977] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.384982] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.384983] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.384985] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.385055] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.402521] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.402530] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.402532] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.402535] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.402578] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.420068] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.420079] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.420085] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.420090] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.420186] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.437608] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.437617] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.437621] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.437625] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.437726] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.455143] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.455151] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.455154] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.455157] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.455209] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.472688] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.472698] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.472703] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.472708] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.472826] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.490225] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.490232] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.490236] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.490239] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.490289] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.507760] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal)
error received: 0000:00:03.0
[47000.735787] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.735791] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.735793] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.735824] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.735826] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal)
error received: 0000:00:03.0


<span class="quote">> systemd logs output after the crash (If your system froze and you get logs
> after reboot):</span >

Sep 06 08:36:58 ezra.blanchardmorris.net kernel: Command line:
BOOT_IMAGE=(hd4,gpt6)/vmlinuz-5.2.11-200.fc30.x86_64+debug
root=UUID=e7b8b34a-e17f-4c2b-b223-eaa636249d2d ro
resume=UUID=52cc8cd8-b06f-4613-8781-a105d0ebf44a rhgb quiet amdgpu.vm_debug=1
Sep 06 08:36:58 ezra.blanchardmorris.net kernel: Kernel command line:
BOOT_IMAGE=(hd4,gpt6)/vmlinuz-5.2.11-200.fc30.x86_64+debug
root=UUID=e7b8b34a-e17f-4c2b-b223-eaa636249d2d ro
resume=UUID=52cc8cd8-b06f-4613-8781-a105d0ebf44a rhgb quiet amdgpu.vm_debug=1
Sep 06 08:36:59 ezra.blanchardmorris.net dracut-cmdline[361]: Using kernel
command line parameters:
BOOT_IMAGE=(hd4,gpt6)/vmlinuz-5.2.11-200.fc30.x86_64+debug
root=UUID=e7b8b34a-e17f-4c2b-b223-eaa636249d2d ro
resume=UUID=52cc8cd8-b06f-4613-8781-a105d0ebf44a rhgb quiet amdgpu.vm_debug=1
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: [drm] amdgpu kernel
modesetting enabled.
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0:
remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0:
remove_conflicting_pci_framebuffers: bar 2: 0xf0000000 -> 0xf01fffff
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0:
remove_conflicting_pci_framebuffers: bar 5: 0xfb600000 -> 0xfb67ffff
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: fb0: switching to amdgpudrmfb
from EFI VGA
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: vgaarb:
deactivate vga console
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: No more
image in the PCI ROM
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: VRAM:
8176M 0x000000F400000000 - 0x000000F5FEFFFFFF (8176M used)
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: GART:
512M 0x0000000000000000 - 0x000000001FFFFFFF
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: AGP:
267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: [drm] amdgpu: 8176M of VRAM
memory ready
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: [drm] amdgpu: 8176M of GTT
memory ready.
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: fbcon: amdgpudrmfb (fb0) is
primary device
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: fb0:
amdgpudrmfb frame buffer device
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring gfx
uses VM inv eng 0 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.0.0 uses VM inv eng 1 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.1.0 uses VM inv eng 4 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.2.0 uses VM inv eng 5 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.3.0 uses VM inv eng 6 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.0.1 uses VM inv eng 7 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.1.1 uses VM inv eng 8 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.2.1 uses VM inv eng 9 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.3.1 uses VM inv eng 10 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
kiq_2.1.0 uses VM inv eng 11 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
sdma0 uses VM inv eng 0 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
page0 uses VM inv eng 1 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
sdma1 uses VM inv eng 4 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
page1 uses VM inv eng 5 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
uvd_0 uses VM inv eng 6 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
uvd_enc_0.0 uses VM inv eng 7 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
uvd_enc_0.1 uses VM inv eng 8 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring vce0
uses VM inv eng 9 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring vce1
uses VM inv eng 10 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring vce2
uses VM inv eng 11 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: [drm] Initialized amdgpu
3.32.0 20150101 for 0000:06:00.0 on minor 0
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]:
Kernel command line: BOOT_IMAGE=(hd4,gpt6)/vmlinuz-5.2.11-200.fc30.x86_64+debug
root=UUID=e7b8b34a-e17f-4c2b-b223-eaa636249d2d ro
resume=UUID=52cc8cd8-b06f-4613-8781-a105d0ebf44a rhgb quiet amdgpu.vm_debug=1
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]:     
   loading driver: amdgpu
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]: (==)
Matched amdgpu as autoconfigured driver 0
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]: (II)
LoadModule: "amdgpu"
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]: (II)
Loading /usr/lib64/xorg/modules/drivers/amdgpu_drv.so
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]: (II)
Module amdgpu: vendor="X.Org Foundation"
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]:     
   All GPUs supported by the amdgpu kernel driver
Sep 06 16:13:18 ezra.blanchardmorris.net net.lutris.Lutris.desktop[2234]:
2019-09-06 16:13:18,530: GPU: 1002:687F 1002:0B36 using amdgpu drivers
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:
[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed
out or interrupted!
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring gfx timeout, signaled seq=2446766, emitted seq=2446767
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process hoi4 pid 24014 thread hoi4:cs0
pid 24015
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: GPU reset
begin!
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:       
amdgpu_fence_process+0xa3/0x100 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:       
sdma_v4_0_process_trap_irq+0x8d/0xa0 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:       
amdgpu_irq_dispatch+0xc0/0x250 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:       
amdgpu_ih_process+0x8d/0x110 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:       
amdgpu_irq_handler+0x1b/0x50 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:       
amdgpu_device_pre_asic_reset+0x41/0x20c [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:       
amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:       
amdgpu_job_timedout+0x109/0x130 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:  #2: 000000007a135814
(&adev->lock_reset){+.+.}, at: amdgpu_device_lock_adev+0x17/0x39 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:  #3: 00000000e83f7d6b
(&dqm->lock_hidden){+.+.}, at: kgd2kfd_pre_reset+0x30/0x60 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: 
amdgpu_device_pre_asic_reset+0x41/0x20c [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: 
amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: 
amdgpu_job_timedout+0x109/0x130 [amdgpu]
Sep 06 21:39:40 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring page1 timeout, signaled seq=95391072, emitted
seq=95391072
Sep 06 21:39:40 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Sep 06 21:39:40 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: GPU reset
begin!
Sep 06 21:39:49 ezra.blanchardmorris.net kernel: [drm:amdgpu_dm_atomic_check
[amdgpu]] *ERROR* [CRTC:47:crtc-0] hw_done or flip_done timed out
Sep 06 21:40:10 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Sep 06 21:40:10 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0xe, input parameter: 0x0, error code: 0x0
Sep 06 21:40:10 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Sep 06 21:40:10 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Sep 06 21:40:10 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0x42, input parameter: 0x1, error code: 0x0
Sep 06 21:40:11 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu

I will try to run apitrace on Hearts of Iron IV to try to capture more
information.  Please let me know if I can be of further assistance in squashing
this annoying bug, like providing crash information with the mesa debug
packages installed.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>