[Bug 215315] New: [REGRESSION BISECTED] amdgpu crashes system suspend - NUC8i7HVKVA
bugzilla-daemon at bugzilla.kernel.org
bugzilla-daemon at bugzilla.kernel.org
Sun Dec 12 23:08:28 UTC 2021
https://bugzilla.kernel.org/show_bug.cgi?id=215315
Bug ID: 215315
Summary: [REGRESSION BISECTED] amdgpu crashes system suspend -
NUC8i7HVKVA
Product: Drivers
Version: 2.5
Kernel Version: 5.15-rc1, 5.15, 5.16-rc4
Hardware: x86-64
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: Video(DRI - non Intel)
Assignee: drivers_video-dri at kernel-bugs.osdl.org
Reporter: lenb at kernel.org
Regression: No
My Intel NUC8i7HVKVA has an AMD GPU.
Until 5.15-rc1, this machine was rock solid in suspend stress testing -- never
crashing after hundreds of hours of back-to-back suspend cycles.
Until this patch went upstream:
commit f7d6779df642720e22bffd449e683bb8690bd3bf (refs/bisect/bad)
Author: Guchun Chen <guchun.chen at amd.com>
Date: Fri Aug 27 18:31:41 2021 +0800
drm/amdgpu: stop scheduler when calling hw_fini (v2)
This gurantees no more work on the ring can be submitted
to hardware in suspend/resume case, otherwise a potential
race will occur and the ring will get no chance to stay
empty before suspend.
v2: Call drm_sched_resubmit_job before drm_sched_start to
restart jobs from the pending list.
Suggested-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
Suggested-by: Christian König <christian.koenig at amd.com>
Signed-off-by: Guchun Chen <guchun.chen at amd.com>
Reviewed-by: Christian König <christian.koenig at amd.com>
Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
Cc: stable at vger.kernel.org
I bisected that the patch before this one was integrated can handle over 1,000
back-to-back "freeze" system suspend cycles. Yet, when this patch is present,
the system may crash before it completes only 100 cycles, and at most lasts a
few hundred cycles.
This crash is present in all following upstream rc's, including 5.15-rc4.
When I revert this patch from 5.15-rc4, stability returns.
Usually, the crash is manifest by a black screen, and a system that does not
respond to ping, and will only respond to a long AC power button press to
remove power; and a subsequent cold reboot.
I have witnessed the crash occur, and the "ubuntu color themed" screen enters
some sort of reverse video mode. In this weird color mode, I've seen a text
window oscillate between scrolling and un-scrolling for a line -- sort of like
it is going back in time, but then changes its mind. There is no response to
keyboard, mouse, or network input.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
More information about the dri-devel
mailing list