AMD Radeon out of memory causes system instability.
James Lawrence
james at egdaemon.com
Fri Sep 27 17:42:45 UTC 2024
No worries Christian! I know everyone puts a lot of work in and stuff gets lost. I was mainly just firing the information into the ether in the hopes someone sees it and was like 'oh yeah' ;)
I was running kernels 6.9-6.10. I havent hit it w/ 6.11 but thats likely because I havent been using my computer as much recently.
I havent tracked down a cause. but usually I'm programming/gaming and watching a movie. so we have steam, a video game, vlc, firefox, sway, and vscode in the mix usually. I'm fairly certain its an application bug just don't which since I never only do one thing.
I'm guessing that the misbehaving application is performing the command submission so rapidly due to not handling errors correctly its causing resource starvation leading the lock up and failure of oom to kick in if as your email asserts should occur.
Sent with [Proton Mail](https://proton.me/mail/home) secure email.
On Friday, September 27th, 2024 at 9:58 AM, Christian König <ckoenig.leichtzumerken at gmail.com> wrote:
> Am 05.08.24 um 08:02 schrieb James Lawrence:
>
>> Apologies if I'm hitting the wrong mailing list. long time user, first time reporter and all that.
>
> Sorry for the delayed reply Without a maintainer in CC such requests are usually overlooked on the mailing list.
>
>> recently my system has been suffering from instability with the graphics system. essentially some application on my system is causing oom for graphics memory.
>> normally I'd just expect a hard crash of the application in such a scenario. instead the system enters a spin loop of command submissions,
>> slows down dramatically generally resulting in the system freezing up.
>>
>> There are a couple issues I'd like to point out with the current situation I'm experiencing:
>>
>> - most importantly the error message doesn't provide any useful information for tracing the source of the issue. no pid, or other diagnostic information.
>> - its very noisy when trying to debug. I can occasionally drop my system to a separate TTY and the message just spams the entire screen. making it impossible to interact with my system even if I wanted to load up debugging tools to analyze the situation.
>>
>> given the error message I believe this line is the source of the log statement.
>> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
>> https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c#L1431
>>
>> Generally I'm wondering if there is anything that can be done to improve the experience for end users in such a scenario.
>>
>> Ideally the system would nuke the misbehaving process similar to how ram ooms are handled.
>
> If you see this message you should get the OOM killer running along with it.
>
> If you don't see this then you probably run into a BUG or something like that.
>
> What kernel version are you using and what did you do to trigger that?
>
> Regards,
> Christian.
>
>> but at a minimum I'd like to be able to figure out how to back track this to the misbehaving process. any help in this regard would be appreciated.
>>
>> Sent with [Proton Mail](https://proton.me/) secure email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20240927/e562af2b/attachment-0001.htm>
More information about the amd-gfx
mailing list