AMD Radeon out of memory causes system instability.
James Lawrence
james at egdaemon.com
Mon Aug 5 06:02:59 UTC 2024
Apologies if I'm hitting the wrong mailing list. long time user, first time reporter and all that.
recently my system has been suffering from instability with the graphics system. essentially some application on my system is causing oom for graphics memory.
normally I'd just expect a hard crash of the application in such a scenario. instead the system enters a spin loop of command submissions,
slows down dramatically generally resulting in the system freezing up.
There are a couple issues I'd like to point out with the current situation I'm experiencing:
- most importantly the error message doesn't provide any useful information for tracing the source of the issue. no pid, or other diagnostic information.
- its very noisy when trying to debug. I can occasionally drop my system to a separate TTY and the message just spams the entire screen. making it impossible to interact with my system even if I wanted to load up debugging tools to analyze the situation.
given the error message I believe this line is the source of the log statement.
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c#L1431
Generally I'm wondering if there is anything that can be done to improve the experience for end users in such a scenario.
Ideally the system would nuke the misbehaving process similar to how ram ooms are handled.
but at a minimum I'd like to be able to figure out how to back track this to the misbehaving process. any help in this regard would be appreciated.
Sent with [Proton Mail](https://proton.me/) secure email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20240805/54a5d838/attachment.htm>
More information about the amd-gfx
mailing list