[Bug 205585] [Regression] [amdgpu] AMD Vega 64 GPU invalid access and EEH under load

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Fri Nov 29 07:45:43 UTC 2019


https://bugzilla.kernel.org/show_bug.cgi?id=205585

--- Comment #3 from Timothy Pearson (tpearson at raptorengineering.com) ---
Just had a chance to test on 5.4.0, still fails (haven't had a chance to bisect
yet; I suspect it's more related to the 64-bit enablement on POWER in 5.4 than
anything else).

The EEH is quite strange, the PEST register decodes as:
MMIO
CFG Read
Other Transaction Type
An MMIO Load, MMIO I/O Write, or other transaction returned from the PCIe link
with a status of Unsupported Request (UR)
Failure address: 0x000000000000

Full trace

[20341.276752702,3] PHB#0033[8:3]: PHB Freeze/Fence detected !
[20341.276848173,3] PHB#0033[8:3]:             PCI FIR=2000000000000000
[20341.276900504,3] PHB#0033[8:3]:         PCI FIR WOF=2000000000000000
[20341.276939625,3] PHB#0033[8:3]:            NEST FIR=0000800000000000
[20341.276979866,3] PHB#0033[8:3]:        NEST FIR WOF=0000800000000000
[20341.277023394,3] PHB#0033[8:3]:            ERR RPT0=0000000000000001
[20341.277068184,3] PHB#0033[8:3]:            ERR RPT1=0000000000000000
[20341.277110812,3] PHB#0033[8:3]:             AIB ERR=0000200000000000
[20341.277830701,3] PHB#0033[8:3]:                  brdgCtl = 00000002
[20341.277906614,3] PHB#0033[8:3]:             deviceStatus = 00000020
[20341.277946469,3] PHB#0033[8:3]:               slotStatus = 00402000
[20341.277981186,3] PHB#0033[8:3]:               linkStatus = e9010008
[20341.278025974,3] PHB#0033[8:3]:             devCmdStatus = 00100107
[20341.278068859,3] PHB#0033[8:3]:             devSecStatus = 00000000
[20341.278109829,3] PHB#0033[8:3]:          rootErrorStatus = 00000000
[20341.278149196,3] PHB#0033[8:3]:          corrErrorStatus = 00000000
[20341.278190145,3] PHB#0033[8:3]:        uncorrErrorStatus = 00000000
[20341.278223684,3] PHB#0033[8:3]:                   devctl = 00000020
[20341.278276525,3] PHB#0033[8:3]:                  devStat = 00000000
[20341.278314241,3] PHB#0033[8:3]:                  tlpHdr1 = 00000000
[20341.278356746,3] PHB#0033[8:3]:                  tlpHdr2 = 00000000
[20341.278397163,3] PHB#0033[8:3]:                  tlpHdr3 = 00000000
[20341.278440709,3] PHB#0033[8:3]:                  tlpHdr4 = 00000000
[20341.278478424,3] PHB#0033[8:3]:                 sourceId = 00000000
[20341.278516547,3] PHB#0033[8:3]:                     nFir = 0000800000000000
[20341.278555975,3] PHB#0033[8:3]:                 nFirMask = 0030001c00000000
[20341.278598653,3] PHB#0033[8:3]:                  nFirWOF = 0000800000000000
[20341.278642004,3] PHB#0033[8:3]:                 phbPlssr = 0000001800000000
[20341.278686870,3] PHB#0033[8:3]:                   phbCsr = 0000001800000000
[20341.278731874,3] PHB#0033[8:3]:                   lemFir = 0004000100000100
[20341.278776158,3] PHB#0033[8:3]:             lemErrorMask = 0000000000000000
[20341.278815229,3] PHB#0033[8:3]:                   lemWOF = 0000000100000000
[20341.278857015,3] PHB#0033[8:3]:           phbErrorStatus = 000005a000000000
[20341.278909821,3] PHB#0033[8:3]:      phbFirstErrorStatus = 0000002000000000
[20341.278951950,3] PHB#0033[8:3]:             phbErrorLog0 = 2148000098000240
[20341.278999524,3] PHB#0033[8:3]:             phbErrorLog1 = a008400000000000
[20341.279042839,3] PHB#0033[8:3]:        phbTxeErrorStatus = 0000200000000000
[20341.279081676,3] PHB#0033[8:3]:   phbTxeFirstErrorStatus = 0000200000000000
[20341.279120945,3] PHB#0033[8:3]:          phbTxeErrorLog0 = 4000000000000000
[20341.279160833,3] PHB#0033[8:3]:          phbTxeErrorLog1 = 0000000000000000
[20341.279207802,3] PHB#0033[8:3]:     phbRxeArbErrorStatus = 0000000000000000
[20341.279254658,3] PHB#0033[8:3]: phbRxeArbFrstErrorStatus = 0000000000000000
[20341.279297181,3] PHB#0033[8:3]:       phbRxeArbErrorLog0 = 0000000000000000
[20341.279334227,3] PHB#0033[8:3]:       phbRxeArbErrorLog1 = 0000000000000000
[20341.279376968,3] PHB#0033[8:3]:     phbRxeMrgErrorStatus = 0000000000000001
[20341.279420726,3] PHB#0033[8:3]: phbRxeMrgFrstErrorStatus = 0000000000000001
[20341.279469009,3] PHB#0033[8:3]:       phbRxeMrgErrorLog0 = 0000000000000000
[20341.279512839,3] PHB#0033[8:3]:       phbRxeMrgErrorLog1 = 0000000000000000
[20341.279561496,3] PHB#0033[8:3]:     phbRxeTceErrorStatus = 0000000000000000
[20341.279604696,3] PHB#0033[8:3]: phbRxeTceFrstErrorStatus = 0000000000000000
[20341.279645952,3] PHB#0033[8:3]:       phbRxeTceErrorLog0 = 0000000000000000
[20341.279685644,3] PHB#0033[8:3]:       phbRxeTceErrorLog1 = 0000000000000000
[20341.279731458,3] PHB#0033[8:3]:        phbPblErrorStatus = 0000000000000800
[20341.279778323,3] PHB#0033[8:3]:   phbPblFirstErrorStatus = 0000000000000800
[20341.279825433,3] PHB#0033[8:3]:          phbPblErrorLog0 = 0000000000000000
[20341.279866852,3] PHB#0033[8:3]:          phbPblErrorLog1 = 00000000028de410
[20341.279903104,3] PHB#0033[8:3]:      phbPcieDlpErrorLog1 = 0000000000000000
[20341.279942888,3] PHB#0033[8:3]:      phbPcieDlpErrorLog2 = 0000000000000000
[20341.279984925,3] PHB#0033[8:3]:    phbPcieDlpErrorStatus = 0000000000000000
[20341.280033282,3] PHB#0033[8:3]:       phbRegbErrorStatus = 0010001000000000
[20341.280080310,3] PHB#0033[8:3]:  phbRegbFirstErrorStatus = 0000001000000000
[20341.280126330,3] PHB#0033[8:3]:         phbRegbErrorLog0 = 4800003c00000000
[20341.280173657,3] PHB#0033[8:3]:         phbRegbErrorLog1 = 0000000000000200
[20341.280218925,3] PHB#0033[8:3]:                PEST[1ff] = 3740002a01000000
0000000000000000
[ 1580.231935] EEH: PHB#33 failure detected, location: N/A
[ 1580.231958] EEH: Frozen PHB#33-PE#0 detected
[ 1580.231969] EEH: Call Trace:
[ 1580.231983] EEH: [00000000741e7c92] __eeh_send_failure_event+0x78/0x150
[ 1580.232006] EEH: [0000000019c0a3ea] eeh_dev_check_failure+0x1d8/0x6b0
[ 1580.232019] EEH: [00000000d1114f7e] eeh_check_failure+0x98/0x100
[ 1580.232080] EEH: [0000000026fdad67] amdgpu_mm_rreg+0x20c/0x250 [amdgpu]
[ 1580.232134] EEH: [0000000087736ee4] vi_flush_hdp+0xa0/0xc0 [amdgpu]
[ 1580.232191] EEH: [000000000b00465e] amdgpu_gart_bind+0x78/0x140 [amdgpu]
[ 1580.232247] EEH: [00000000e410157a] amdgpu_ttm_gart_bind+0x124/0x140
[amdgpu]
[ 1580.232295] EEH: [0000000027696b17] amdgpu_ttm_alloc_gart+0x19c/0x230
[amdgpu]
[ 1580.232350] EEH: [00000000abff626d] amdgpu_vm_sdma_map_table+0x4c/0x70
[amdgpu]
[ 1580.232411] EEH: [000000003babc62e] amdgpu_vm_clear_bo+0x188/0x460 [amdgpu]
[ 1580.232460] EEH: [000000003135d9d5] amdgpu_vm_update_ptes+0x300/0x5f0
[amdgpu]
[ 1580.232513] EEH: [00000000a9b62a4c] amdgpu_vm_bo_update_mapping+0x100/0x140
[amdgpu]
[ 1580.232565] EEH: [00000000c53ee852] amdgpu_vm_bo_update+0x348/0x8a0 [amdgpu]
[ 1580.232614] EEH: [00000000e468e987] amdgpu_gem_va_ioctl+0x5c4/0x620 [amdgpu]
[ 1580.232644] EEH: [000000002c0a19e7] drm_ioctl_kernel+0xfc/0x180 [drm]
[ 1580.232671] EEH: [000000005cb0f244] drm_ioctl+0x238/0x480 [drm]
[ 1580.232725] EEH: [00000000b812c3a6] amdgpu_drm_ioctl+0x70/0xd0 [amdgpu]
[ 1580.232749] EEH: [000000004de566d7] do_vfs_ioctl+0xe0/0xac0
[ 1580.232770] EEH: [0000000045206404] ksys_ioctl+0xc4/0x110
[ 1580.232782] EEH: [000000001e273b3a] sys_ioctl+0x28/0x80
[ 1580.232804] EEH: [00000000aa248bf4] system_call+0x5c/0x68
[ 1580.232834] EEH: This PCI device has failed 1 times in the last hour and
will be permanently disabled after 5 failures.
[ 1580.232880] EEH: Notify device drivers to shutdown
[ 1580.232911] EEH: Beginning: 'error_detected(IO frozen)'
[ 1580.232933] PCI 0033:00:00.0#01fe: EEH: no driver
[ 1580.232935] PCI 0033:01:00.0#0000: EEH: driver not EEH aware
[ 1580.232957] PCI 0033:01:00.1#0000: EEH: driver not EEH aware
[ 1580.232970] EEH: Finished:'error_detected(IO frozen)' with aggregate
recovery state:'none'
[ 1580.232998] EEH: Collect temporary log
[ 1580.233008] PHB4 PHB#51 Diag-data (Version: 1)
[ 1580.233018] brdgCtl:    00000002
[ 1580.233028] RootSts:    00000020 00402000 e9010008 00100107 00000000
[ 1580.233040] nFir:       0000800000000000 0030001c00000000 0000800000000000
[ 1580.233062] PhbSts:     0000001800000000 0000001800000000
[ 1580.233082] Lem:        0004000100000100 0000000000000000 0000000100000000
[ 1580.233104] PhbErr:     000005a000000000 0000002000000000 2148000098000240
a008400000000000
[ 1580.233136] PhbTxeErr:  0000200000000000 0000200000000000 4000000000000000
0000000000000000
[ 1580.233169] RxeMrgErr:  0000000000000001 0000000000000001 0000000000000000
0000000000000000
[ 1580.233192] PblErr:     0000000000000800 0000000000000800 0000000000000000
00000000028de410
[ 1580.233225] RegbErr:    0010001000000000 0000001000000000 4800003c00000000
0000000000000200
[ 1580.233259] EEH: Reset with hotplug activity
[ 1580.891352] snd_hda_codec_hdmi hdaudioC0D0: Unable to sync register
0x2f0d00. -5
[ 1590.340025] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=7463, emitted seq=7465
[ 1590.340117] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process  pid 0 thread  pid 0
[ 1590.340172] amdgpu 0033:01:00.0: GPU reset begin!
[ 1590.350000] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=325761, emitted seq=325763
[ 1590.350057] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process hyperspace pid 4160 thread hyperspace:cs0 pid 4161
[ 1590.350089] amdgpu 0033:01:00.0: GPU reset begin!
[ 1590.350108] [drm] Bailing on TDR for s_job:4f608, as another already in
progress
[ 1590.350923] amdgpu: [powerplay]
[ 1590.350923]  last message was failed ret is 65535
[ 1590.350949] amdgpu: [powerplay]
[ 1590.350949]  failed to send message 261 ret is 65535
[ 1590.350971] amdgpu: [powerplay]
[ 1590.350971]  last message was failed ret is 65535
[ 1590.350983] amdgpu: [powerplay]
[ 1590.350983]  failed to send message 261 ret is 65535
[ 1590.350996] amdgpu: [powerplay]
[ 1590.350996]  last message was failed ret is 65535
[ 1590.351017] amdgpu: [powerplay]
[ 1590.351017]  failed to send message 261 ret is 65535
[ 1590.351030] amdgpu: [powerplay]
[ 1590.351030]  last message was failed ret is 65535
[ 1590.351064] amdgpu: [powerplay]
[ 1590.351064]  failed to send message 261 ret is 65535
[ 1590.351096] amdgpu: [powerplay]
[ 1590.351096]  last message was failed ret is 65535
[ 1590.351127] amdgpu: [powerplay]
[ 1590.351127]  failed to send message 261 ret is 65535
[ 1590.351158] amdgpu: [powerplay]
[ 1590.351158]  last message was failed ret is 65535
[ 1590.351202] amdgpu: [powerplay]
[ 1590.351202]  failed to send message 261 ret is 65535
[ 1590.351224] amdgpu: [powerplay]
[ 1590.351224]  last message was failed ret is 65535
[ 1590.351236] amdgpu: [powerplay]
[ 1590.351236]  failed to send message 261 ret is 65535
[ 1590.351251] amdgpu: [powerplay]
[ 1590.351251]  last message was failed ret is 65535
[ 1590.351272] amdgpu: [powerplay]
[ 1590.351272]  failed to send message 261 ret is 65535
[ 1590.351303] amdgpu: [powerplay]
[ 1590.351303]  last message was failed ret is 65535
[ 1590.351324] amdgpu: [powerplay]
[ 1590.351324]  failed to send message 261 ret is 65535
[ 1590.351356] amdgpu: [powerplay]
[ 1590.351356]  last message was failed ret is 65535
[ 1590.351378] amdgpu: [powerplay]
[ 1590.351378]  failed to send message 261 ret is 65535
[ 1590.351410] amdgpu: [powerplay]
[ 1590.351410]  last message was failed ret is 65535
[ 1590.351441] amdgpu: [powerplay]
[ 1590.351441]  failed to send message 261 ret is 65535
[ 1590.351463] amdgpu: [powerplay]
[ 1590.351463]  last message was failed ret is 65535
[ 1590.351485] amdgpu: [powerplay]
[ 1590.351485]  failed to send message 261 ret is 65535
[ 1590.351520] amdgpu: [powerplay]
[ 1590.351520]  last message was failed ret is 65535
[ 1590.351541] amdgpu: [powerplay]
[ 1590.351541]  failed to send message 261 ret is 65535
[ 1590.351572] amdgpu: [powerplay]
[ 1590.351572]  last message was failed ret is 65535
[ 1590.351603] amdgpu: [powerplay]
[ 1590.351603]  failed to send message 261 ret is 65535
[ 1590.351634] amdgpu: [powerplay]
[ 1590.351634]  last message was failed ret is 65535
[ 1590.351666] amdgpu: [powerplay]
[ 1590.351666]  failed to send message 261 ret is 65535
[ 1590.351698] amdgpu: [powerplay]
[ 1590.351698]  last message was failed ret is 65535
[ 1590.351730] amdgpu: [powerplay]
[ 1590.351730]  failed to send message 261 ret is 65535
[ 1590.351761] amdgpu: [powerplay]
[ 1590.351761]  last message was failed ret is 65535
[ 1590.351795] amdgpu: [powerplay]
[ 1590.351795]  failed to send message 261 ret is 65535
[ 1590.351980] amdgpu: [powerplay]
[ 1590.351980]  last message was failed ret is 65535
[ 1590.352014] amdgpu: [powerplay]
[ 1590.352014]  failed to send message 306 ret is 65535
[ 1590.352039] amdgpu: [powerplay]
[ 1590.352039]  last message was failed ret is 65535
[ 1590.352080] amdgpu: [powerplay]
[ 1590.352080]  failed to send message 5e ret is 65535
[ 1590.352103] amdgpu: [powerplay]
[ 1590.352103]  last message was failed ret is 65535
[ 1590.352134] amdgpu: [powerplay]
[ 1590.352134]  failed to send message 145 ret is 65535
[ 1590.352156] amdgpu: [powerplay]
[ 1590.352156]  last message was failed ret is 65535
[ 1590.352190] amdgpu: [powerplay]
[ 1590.352190]  failed to send message 146 ret is 65535
[ 1590.352225] amdgpu: [powerplay]
[ 1590.352225]  last message was failed ret is 65535
[ 1590.352271] amdgpu: [powerplay]
[ 1590.352271]  failed to send message 148 ret is 65535
[ 1590.352292] amdgpu: [powerplay]
[ 1590.352292]  last message was failed ret is 65535
[ 1590.352304] amdgpu: [powerplay]
[ 1590.352304]  failed to send message 145 ret is 65535
[ 1590.352339] amdgpu: [powerplay]
[ 1590.352339]  last message was failed ret is 65535
[ 1590.352370] amdgpu: [powerplay]
[ 1590.352370]  failed to send message 146 ret is 65535
[ 1590.383835] [drm] REG_WAIT timeout 10us * 3000 tries -
dce110_stream_encoder_dp_blank line:956
[ 1590.383875] ------------[ cut here ]------------
[ 1590.383912] WARNING: CPU: 48 PID: 1214 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:332
generic_reg_wait+0x214/0x230 [amdgpu]
[ 1590.383945] Modules linked in: i2c_dev uinput amdgpu snd_usb_audio
drm_vram_helper snd_usbmidi_lib gpu_sched ttm snd_rawmidi snd_seq_device ses mc
drm_kms_helper snd_hda_codec_hdmi enclosure joydev sd_mod evdev
scsi_transport_sas drm snd_hda_intel sg snd_hda_codec
drm_panel_orientation_quirks snd_hda_core syscopyarea sysfillrect ecb snd_hwdep
aacraid sysimgblt fb_sys_fops snd_pcm nvme nvme_core xts i2c_algo_bit snd_timer
snd soundcore ctr cbc ofpart vmx_crypto ipmi_powernv ipmi_devintf powernv_flash
gf128mul mtd ipmi_msghandler opal_prd at24 binfmt_misc parport_pc lp parport
ip_tables x_tables autofs4 nfsv3 nfs_acl nfs lockd grace sunrpc fscache raid10
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath
linear md_mod xhci_pci xhci_hcd usbcore tg3 libphy
[ 1590.384181] CPU: 48 PID: 1214 Comm: kworker/48:2 Not tainted 5.4.0 #5
[ 1590.384194] Workqueue: events drm_sched_job_timedout [gpu_sched]
[ 1590.384205] NIP:  c00800000888505c LR: c00800000888504c CTR:
c000000000715d70
[ 1590.384238] REGS: c0000007dd55ec40 TRAP: 0700   Not tainted  (5.4.0)
[ 1590.384257] MSR:  9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR:
28224228  XER: 00000000
[ 1590.384284] CFAR: c0000000001b66f4 IRQMASK: 0
[ 1590.384284] GPR00: c00800000888504c c0000007dd55eed0 c0080000089f5000
0000000000000052
[ 1590.384284] GPR04: c0000007fdd1ce18 c0000007fdda5858 0000000000000490
c0000007fffc9000
[ 1590.384284] GPR08: 0000000000000007 0000000000000000 00000007fced0000
9000000002001033
[ 1590.384284] GPR12: 0000000000004000 c0000007fffc9000 c000200715000000
c0000007eff449c0
[ 1590.384284] GPR16: c0000007dc7a6000 c0000007def45300 0000000000000000
00000000000003bc
[ 1590.384284] GPR20: c0080000088f6470 0000000000000000 0000000000004ea4
0000000000010000
[ 1590.384284] GPR24: 0000000000000000 c00800000890ca90 c0000007a9e40680
0000000000000bb8
[ 1590.384284] GPR28: 0000000000000010 0000000000000bb8 000000000000000a
0000000000000bb9
[ 1590.384414] NIP [c00800000888505c] generic_reg_wait+0x214/0x230 [amdgpu]
[ 1590.384450] LR [c00800000888504c] generic_reg_wait+0x204/0x230 [amdgpu]
[ 1590.384467] Call Trace:
[ 1590.384499] [c0000007dd55eed0] [c00800000888504c]
generic_reg_wait+0x204/0x230 [amdgpu] (unreliable)
[ 1590.384548] [c0000007dd55efa0] [c00800000882caec]
dce110_stream_encoder_dp_blank+0x104/0x170 [amdgpu]
[ 1590.384601] [c0000007dd55f030] [c00800000885a07c]
dce110_blank_stream+0xf4/0x120 [amdgpu]
[ 1590.384632] [c0000007dd55f060] [c0080000088743bc]
core_link_disable_stream+0x64/0x420 [amdgpu]
[ 1590.384692] [c0000007dd55f140] [c008000008857dbc]
dce110_reset_hw_ctx_wrap+0xf4/0x2e0 [amdgpu]
[ 1590.384745] [c0000007dd55f200] [c00800000885a2e0]
dce110_apply_ctx_to_hw+0x58/0x600 [amdgpu]
[ 1590.384797] [c0000007dd55f2d0] [c00800000886dcec]
dc_commit_state+0x3d4/0x820 [amdgpu]
[ 1590.384853] [c0000007dd55f400] [c0080000087fe94c]
amdgpu_dm_atomic_commit_tail+0x3c4/0x19a8 [amdgpu]
[ 1590.384888] [c0000007dd55f700] [c008000007d93fb0] commit_tail+0xf8/0x1f0
[drm_kms_helper]
[ 1590.384912] [c0000007dd55f740] [c008000007d942a8]
drm_atomic_helper_commit+0x1e0/0x1f0 [drm_kms_helper]
[ 1590.384951] [c0000007dd55f780] [c0080000087fbac8]
amdgpu_dm_atomic_commit+0x110/0x140 [amdgpu]
[ 1590.384992] [c0000007dd55f7e0] [c0080000079ce2cc]
drm_atomic_commit+0x74/0xa0 [drm]
[ 1590.385016] [c0000007dd55f850] [c008000007d94768]
drm_atomic_helper_disable_all+0x290/0x2b0 [drm_kms_helper]
[ 1590.385044] [c0000007dd55f8a0] [c008000007d949dc]
drm_atomic_helper_suspend+0x154/0x1a0 [drm_kms_helper]
[ 1590.385094] [c0000007dd55f920] [c0080000087f717c] dm_suspend+0x44/0xa0
[amdgpu]
[ 1590.385124] [c0000007dd55f950] [c008000008621e2c]
amdgpu_device_ip_suspend_phase1+0xe4/0x190 [amdgpu]
[ 1590.385163] [c0000007dd55f9d0] [c008000008623ddc]
amdgpu_device_ip_suspend+0x44/0xe0 [amdgpu]
[ 1590.385192] [c0000007dd55fa10] [c00800000888de54]
amdgpu_device_pre_asic_reset+0x248/0x28c [amdgpu]
[ 1590.385230] [c0000007dd55fab0] [c00800000888e7b8]
amdgpu_device_gpu_recover+0x2f0/0xb4c [amdgpu]
[ 1590.385268] [c0000007dd55fb90] [c008000008779f3c]
amdgpu_job_timedout+0x124/0x170 [amdgpu]
[ 1590.385290] [c0000007dd55fc30] [c008000007651244]
drm_sched_job_timedout+0x6c/0x110 [gpu_sched]
[ 1590.385336] [c0000007dd55fc70] [c000000000154ee0]
process_one_work+0x260/0x520
[ 1590.385379] [c0000007dd55fd10] [c000000000155228] worker_thread+0x88/0x5f0
[ 1590.385400] [c0000007dd55fdb0] [c00000000015f21c] kthread+0x19c/0x1b0
[ 1590.385430] [c0000007dd55fe20] [c00000000000bd54]
ret_from_kernel_thread+0x5c/0x68
[ 1590.385463] Instruction dump:
[ 1590.385480] 4bfffed4 3c620000 e8633ab8 7e679b78 7e86a378 7f65db78 7fc4f378
4800f091
[ 1590.385513] e8410018 813a0020 2f890001 419eff7c <0fe00000> 4bffff74 60000000
60000000
[ 1590.385546] ---[ end trace 59567a2f8b8649ed ]---
[ 1591.478349] PCI 0033:01:00.0#0000: EEH: 2100000 reads ignored for recovering
device at location=CPU2 Slot1 (16x) driver=amdgpu
[ 1591.478370] PCI 0033:01:00.0#0000: EEH: Might be infinite loop in amdgpu
driver
[ 1591.478382] CPU: 48 PID: 1214 Comm: kworker/48:2 Tainted: G        W        
5.4.0 #5
[ 1591.478405] Workqueue: events drm_sched_job_timedout [gpu_sched]
[ 1591.478414] Call Trace:
[ 1591.478422] [c0000007dd55e940] [c000000000a9ccc8] dump_stack+0xbc/0x104
(unreliable)
[ 1591.478434] [c0000007dd55e980] [c00000000003e788]
eeh_dev_check_failure+0x598/0x6b0
[ 1591.478455] [c0000007dd55ea30] [c00000000003eb08]
eeh_check_failure+0x98/0x100
[ 1591.478491] [c0000007dd55ea70] [c008000008622744] amdgpu_mm_rreg+0x20c/0x250
[amdgpu]
[ 1591.478539] [c0000007dd55eac0] [c0080000086298f4] cail_reg_read+0x2c/0x50
[amdgpu]
[ 1591.478577] [c0000007dd55eae0] [c00800000863255c]
atom_get_src_int+0x104/0xa00 [amdgpu]
[ 1591.478615] [c0000007dd55eb90] [c008000008633e30] atom_op_test+0xd8/0x1d0
[amdgpu]
[ 1591.478660] [c0000007dd55ec20] [c008000008636a2c]
amdgpu_atom_execute_table_locked+0x204/0x3e0 [amdgpu]
[ 1591.478701] [c0000007dd55ed20] [c008000008636d30]
atom_op_calltable+0x128/0x1e0 [amdgpu]
[ 1591.478740] [c0000007dd55eda0] [c008000008636a2c]
amdgpu_atom_execute_table_locked+0x204/0x3e0 [amdgpu]
[ 1591.478770] [c0000007dd55eea0] [c008000008636e58]
amdgpu_atom_execute_table+0x70/0xb0 [amdgpu]
[ 1591.478829] [c0000007dd55eee0] [c008000008810f30]
transmitter_control_v1_6+0x128/0x220 [amdgpu]
[ 1591.478887] [c0000007dd55ef40] [c00800000880c410]
bios_parser_transmitter_control+0x38/0x70 [amdgpu]
[ 1591.478944] [c0000007dd55ef60] [c00800000882f678]
dce110_link_encoder_disable_output+0xd0/0x1c0 [amdgpu]
[ 1591.478997] [c0000007dd55f020] [c00800000887cbfc]
dp_disable_link_phy+0xa4/0x1d0 [amdgpu]
[ 1591.479029] [c0000007dd55f060] [c008000008874488]
core_link_disable_stream+0x130/0x420 [amdgpu]
[ 1591.479082] [c0000007dd55f140] [c008000008857dbc]
dce110_reset_hw_ctx_wrap+0xf4/0x2e0 [amdgpu]
[ 1591.479134] [c0000007dd55f200] [c00800000885a2e0]
dce110_apply_ctx_to_hw+0x58/0x600 [amdgpu]
[ 1591.479186] [c0000007dd55f2d0] [c00800000886dcec]
dc_commit_state+0x3d4/0x820 [amdgpu]
[ 1591.479241] [c0000007dd55f400] [c0080000087fe94c]
amdgpu_dm_atomic_commit_tail+0x3c4/0x19a8 [amdgpu]
[ 1591.479280] [c0000007dd55f700] [c008000007d93fb0] commit_tail+0xf8/0x1f0
[drm_kms_helper]
[ 1591.479325] [c0000007dd55f740] [c008000007d942a8]
drm_atomic_helper_commit+0x1e0/0x1f0 [drm_kms_helper]
[ 1591.479381] [c0000007dd55f780] [c0080000087fbac8]
amdgpu_dm_atomic_commit+0x110/0x140 [amdgpu]
[ 1591.479419] [c0000007dd55f7e0] [c0080000079ce2cc]
drm_atomic_commit+0x74/0xa0 [drm]
[ 1591.479445] [c0000007dd55f850] [c008000007d94768]
drm_atomic_helper_disable_all+0x290/0x2b0 [drm_kms_helper]
[ 1591.479484] [c0000007dd55f8a0] [c008000007d949dc]
drm_atomic_helper_suspend+0x154/0x1a0 [drm_kms_helper]
[ 1591.479542] [c0000007dd55f920] [c0080000087f717c] dm_suspend+0x44/0xa0
[amdgpu]
[ 1591.479589] [c0000007dd55f950] [c008000008621e2c]
amdgpu_device_ip_suspend_phase1+0xe4/0x190 [amdgpu]
[ 1591.479640] [c0000007dd55f9d0] [c008000008623ddc]
amdgpu_device_ip_suspend+0x44/0xe0 [amdgpu]
[ 1591.479674] [c0000007dd55fa10] [c00800000888de54]
amdgpu_device_pre_asic_reset+0x248/0x28c [amdgpu]
[ 1591.479712] [c0000007dd55fab0] [c00800000888e7b8]
amdgpu_device_gpu_recover+0x2f0/0xb4c [amdgpu]
[ 1591.479769] [c0000007dd55fb90] [c008000008779f3c]
amdgpu_job_timedout+0x124/0x170 [amdgpu]
[ 1591.479815] [c0000007dd55fc30] [c008000007651244]
drm_sched_job_timedout+0x6c/0x110 [gpu_sched]
[ 1591.479860] [c0000007dd55fc70] [c000000000154ee0]
process_one_work+0x260/0x520
[ 1591.479903] [c0000007dd55fd10] [c000000000155228] worker_thread+0x88/0x5f0
[ 1591.479923] [c0000007dd55fdb0] [c00000000015f21c] kthread+0x19c/0x1b0
[ 1591.479953] [c0000007dd55fe20] [c00000000000bd54]
ret_from_kernel_thread+0x5c/0x68
[ 1592.584699] PCI 0033:01:00.0#0000: EEH: 4200000 reads ignored for recovering
device at location=CPU2 Slot1 (16x) driver=amdgpu
[ 1592.584723] PCI 0033:01:00.0#0000: EEH: Might be infinite loop in amdgpu
driver

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list