AMD Radeon out of memory causes system instability.

Christian König ckoenig.leichtzumerken at gmail.com
Fri Sep 27 13:58:20 UTC 2024


Am 05.08.24 um 08:02 schrieb James Lawrence:
> Apologies if I'm hitting the wrong mailing list. long time user, first 
> time reporter and all that.

Sorry for the delayed reply  Without a maintainer in CC such requests 
are usually overlooked on the mailing list.

>
> recently my system has been suffering from instability with the 
> graphics system. essentially some application on my system is causing 
> oom for graphics memory.
> normally I'd just expect a hard crash of the application in such a 
> scenario. instead the system enters a spin loop of command submissions,
> slows down dramatically generally resulting in the system freezing up.
>
> There are a couple issues I'd like to point out with the current 
> situation I'm experiencing:
>
>   * most importantly the error message doesn't provide any useful
>     information for tracing the source of the issue. no pid, or other
>     diagnostic information.
>   * its very noisy when trying to debug. I can occasionally drop my
>     system to a separate TTY and the message just spams the entire
>     screen. making it impossible to interact with my system even if I
>     wanted to load up debugging tools to analyze the situation.
>
>
> given the error message I believe this line is the source of the log 
> statement.
> |[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command 
> submission!|​
> https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c#L1431
>
> Generally I'm wondering if there is anything that can be done to 
> improve the experience for end users in such a scenario.
>
> Ideally the system would nuke the misbehaving process similar to how 
> ram ooms are handled.

If you see this message you should get the OOM killer running along with it.

If you don't see this then you probably run into a BUG or something like 
that.

What kernel version are you using and what did you do to trigger that?

Regards,
Christian.


>
> but at a minimum I'd like to be able to figure out how to back track 
> this to the misbehaving process. any help in this regard would be 
> appreciated.
>
>
> Sent with Proton Mail <https://proton.me/> secure email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20240927/1203f819/attachment.htm>


More information about the amd-gfx mailing list