Kernel bug related to drivers/gpu/drm/ttm/ttm_bo.c

Christopher Klooz py0xc3 at my.mail.de
Sun May 28 16:07:36 UTC 2023


Hi!

I think we have a serious kernel bug that is related to or inside in 
drivers/gpu/drm/ttm/ttm_bo.c

The reason for my assumptions lies in one of my recent system freezes 
with kernel 6.3.4 that go along with massive kernel error logs in 
journalctl. An extract from the logs:

...
May 28 14:38:41 fedora.domain kernel: WARNING: CPU: 4 PID: 5523 at drivers/gpu/drm/ttm/ttm_bo.c:326 ttm_bo_release+0x289/0x2e0 [ttm]
...
May 28 14:38:41 fedora.domain kernel: WARNING: CPU: 4 PID: 5523 at drivers/gpu/drm/ttm/ttm_bo.c:327 ttm_bo_release+0x296/0x2e0 [ttm]
...
May 28 14:38:41 fedora.domain kernel: kernel BUG at drivers/gpu/drm/ttm/ttm_bo.c:193!
...

The above information is more detailed than most of the occurrences, and 
its the first occurrence that did not end up in a freeze immediately or 
a few seconds after it. However, the corrupted state of the system 
became again apparent when I tried to shutdown some time after the above 
errors:

...

|May 28 14:51:09 fedora.domain kernel: #PF: error_code(0x0000) - 
not-present page May 28 14:51:09 fedora.domain kernel: #PF: supervisor 
read access in kernel mode May 28 14:51:09 fedora.domain kernel: BUG: 
unable to handle page fault for address: 0000003000300010|
...

I have that issue already for a longer time, at least since 6.2.X.

You can find my bug report and many full logs (including the full logs 
of the above) from root's journalctl in: 
https://bugzilla.redhat.com/show_bug.cgi?id=2193110

Ignore the title and the initial comments of the bug report, it is 
definitely not related to Firefox. Assuming that you want to focus on 
the kernel error logs of 6.3.X, you might focus only on the last 5 comments.

Additionally to the journalctl error logs that I already added through 
links in the bug report, I tested today once again 6.3.4 with 
amd_pstate=active (by default I am on amd_state=passive which feels most 
stable on my hardware) -> see 
https://gitlab.com/py0xc31/public-tmp-storage/-/blob/main/retry6.3.4/fullSystemFreeze.kernel6.3.4.pstate-ACTIVE.log 
(I have not yet put this into the bug report since I no longer assume it 
is relevant)


Some other people from Fedora have experienced related issues; see the 
comments on the test result pages in our update system:

https://bodhi.fedoraproject.org/updates/FEDORA-2023-514965dd8a (6.3.3 & 
6.3.4)

https://bodhi.fedoraproject.org/updates/FEDORA-2023-26325e5399 (6.2.15) 
-> I am quite sure I have seen that issue already before 6.2.15.

Maybe also related (but without explicit information referring to ttm_bo.c):

https://gitlab.freedesktop.org/drm/amd/-/issues/2548

https://gitlab.freedesktop.org/drm/amd/-/issues/2447


Let me know if you need more information or if I can help with testing.

My hardware: AMD Ryzen 6850 Pro, I have no dedicated graphics but only 
the AMD graphics of my Ryzen. I use Fedora 38 KDE -> cat 
/proc/sys/kernel/tainted = 0.

I will try updating my BIOS in the next days when I have time to see if 
that makes a difference, but I guess this is not related given the logs.


Regards,

Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20230528/9af62c0b/attachment.htm>


More information about the dri-devel mailing list