[Bug 216119] 087451f372bf76d breaks hibernation on amdgpu Radeon R9 390

Tue Aug 9 18:53:00 UTC 2022

https://bugzilla.kernel.org/show_bug.cgi?id=216119

--- Comment #35 from Harald Judt (h.judt at gmx.at) ---
I have not had time yet to try any patches, but here are more detailed dmesg
messages when things get awry after resuming from hibernation and vt-switching
(see symptoms described above). Maybe they give someone additional hints what's
going wrong:

[drm:amdgpu_dm_atomic_commit_tail] *ERROR* Waiting for fences timed out!
[drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, but soft recovered
[drm:amdgpu_dm_atomic_commit_tail] *ERROR* Waiting for fences timed out!
[drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, but soft recovered
[drm:amdgpu_dm_atomic_commit_tail] *ERROR* Waiting for fences timed out!
[drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, but soft recovered
[drm:amdgpu_dm_atomic_commit_tail] *ERROR* Waiting for fences timed out!
[drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, but soft recovered
[drm:amdgpu_dm_atomic_commit_tail] *ERROR* Waiting for fences timed out!
[drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, but soft recovered
[drm:amdgpu_dm_atomic_commit_tail] *ERROR* Waiting for fences timed out!
[drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, but soft recovered
[drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, but soft recovered
amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x000cc40c
amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C0C400C
amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 6, pasid 0) at page 0, read
from 'TC7' (0x54433700) (196)
amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0004c40c
amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C400C
amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 2, pasid 0) at page 0, read
from 'TC7' (0x54433700) (196)
amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x000ac40c
amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A0C400C
amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 5, pasid 0) at page 0, read
from 'TC7' (0x54433700) (196)
amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0004c40c
amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C400C
amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 2, pasid 0) at page 0, read
from 'TC7' (0x54433700) (196)
amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0004480c
amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0404800C
amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 2, pasid 0) at page 0, read
from 'TC0' (0x54433000) (72)
[drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, but soft recovered
amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0004480c
amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0404800C
amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 2, pasid 0) at page 0, read
from 'TC0' (0x54433000) (72)
[drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, but soft recovered
amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x000a480c
amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
amdgpu 0000:01:00.0: amdgpu:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 5, pasid 0) at page 0, read
from 'TC0' (0x54433000) (72)
[drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, but soft recovered

The funny thing was that another X session was still somehow usable (it takes a
while to switch to it because of the hangs). But in general, those hangs when
vt-switching sucks.

I will try to revert all the fbdev patches again to see if that also happens
with the old fb code, though I cannot remember it did.

I will also test whether that happens when using only S3 instead of S4.

It will probably take me a few days until I can get to it though.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.