[Bug 213145] AMDGPU resets, timesout and crashes after "ERROR Waiting for fences timed out!"

Mon Sep 12 17:07:07 UTC 2022

https://bugzilla.kernel.org/show_bug.cgi?id=213145

nvaert1986 (nvaert1986 at hotmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nvaert1986 at hotmail.com

--- Comment #21 from nvaert1986 (nvaert1986 at hotmail.com) ---
I'm experiencing the same issue on 5.19 with mesa. It rarely happens, but when
it happens my whole system needs a reboot. I've seen it happening with Firefox
and Steam so far.

[drm:0xffffffffc04e61a6] *ERROR* Waiting for fences timed out!
[drm:0xffffffffc04e61a6] *ERROR* Waiting for fences timed out!
[drm:0xffffffffc0465370] *ERROR* Process information: process firefox pid 1918
thread firefox:cs0 pid 2069
amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
amdgpu 0000:03:00.0: [drm:0xffffffffc0321d49] *ERROR* ring kiq_2.1.0 test
failed (-110)
[drm:0xffffffffc03b4bfc] *ERROR* KGQ disable failed
[drm:0xffffffffc03b4a60] *ERROR* failed to halt cp gfx
[drm] free PSP TMR buffer
DMAR: DRHD: handling fault status reg 3
 DMAR: [DMA Read NO_PASID] Request device [03:00.0] fault addr 0x77d0541a000
[fault reason 0x04] Access beyond MGAW
 DMAR: DRHD: handling fault status reg 3
 DMAR: [DMA Read NO_PASID] Request device [03:00.0] fault addr 0x77d0541e000
[fault reason 0x04] Access beyond MGAW
CPU: 0 PID: 1028 Comm: kworker/u48:22 Tainted: G           O      5.19.1
 Hardware name: Micro-Star International Co., Ltd. MS-7D31/MPG Z690 EDGE WIFI
DDR4 (MS-7D31), BIOS 1.40 05/18/2022
Workqueue: amdgpu-reset-dev 0xffffffffc0242a90
Call Trace:
  <TASK>
0xffffffffa28f2514
0xffffffffc065e673
0xffffffffc065ef99
0xffffffffc04653ca
0xffffffffc0242aeb
0xffffffffa1f30ae8amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset

0xffffffffa1f31048
? 0xffffffffa1f31000
0xffffffffa1f372fa
? 0xffffffffa1f37220
0xffffffffa1e010ef
</TASK>
amdgpu 0000:03:00.0: amdgpu: MODE1 reset
amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
 [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
 [drm] VRAM is lost due to GPU reset!
 [drm] PSP is resuming...

 Here it initializes my full GPU, but then throws:
 [drm] Skip scheduling IBs!
and the crash starts over again.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213145] AMDGPU resets, timesout and crashes after "*ERROR* Waiting for fences timed out!"

[Bug 213145] AMDGPU resets, timesout and crashes after "ERROR Waiting for fences timed out!"