[Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"

Andreas Thalhammer andreas.thalhammer-linux at gmx.net
Mon Oct 24 16:19:17 UTC 2022


Am 24.10.22 um 13:31 schrieb Thomas Zimmermann:
> Hi
>
> Am 24.10.22 um 13:27 schrieb Greg KH:
>> On Mon, Oct 24, 2022 at 12:41:43PM +0200, Thorsten Leemhuis wrote:
>>> Hi! Thx for the reply.
>>>
>>> On 24.10.22 12:26, Thomas Zimmermann wrote:
>>>> Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
>>>>>
>>>>> I noticed a regression report in bugzilla.kernel.org. As many (most?)
>>>>> kernel developer don't keep an eye on it, I decided to forward it by
>>>>> mail. Quoting from
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=216616  :
>>>>>
>>>>>>    Andreas 2022-10-22 14:25:32 UTC
>>>>>>
>>>>>> Created attachment 303074 [details]
>>>>>> dmesg
>>>>
>>>> I've looked at the kernel log and found that simpledrm has been loaded
>>>> *after* amdgpu, which should never happen. The problematic patch has
>>>> been taken from a long list of refactoring work on this code. No wonder
>>>> that it doesn't work as expected.
>>>>
>>>> Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove
>>>> remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and
>>>> report on the results. It should fix the problem.
>>>
>>> Greg, is that enough for you to pick this up? Or do you want Andreas to
>>> test first if it really fixes the reported problem?
>>
>> This should be good enough.  If this does NOT fix the issue, please let
>> me know.
>
> Thanks a lot. I think I can provided a dedicated fix if the proposed
> commit doesn't work.
>
> Best regards
> Thomas
>
>>
>> thanks,
>>
>> greg k-h
>

Thanks... In short: the additional patch did NOT fix the problem.

I don't use git and I don't know how to /cherry-pick commit/
9d69ef183815, but I found the patch here:
https://patchwork.freedesktop.org/patch/494609/

I hope that's the right one. I reintegrated
v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch
and also applied
v2-04-11-fbdev-core-Remove-remove_conflicting_pci_framebuffers.patch,
did a "make mrproper" and thereafter compiled a clean new 6.0.3 kernel
(same .config).

Now the system doesn't even boot to a console. The first boot got me to
a rcu_shed stall on CPUs/tasks, same as above, but this time with:
Workqueue: btrfs-cache btrfs_work_helper

I booted a second time with the same kernel, and it got stuck after
mounting the root btrfs filesystem (what looked like a total freeze, but
when it didn't show a rcu_stall message after ~2 min I got impatient and
wanted to see if I had just busted my root filesystem...)

I booted 6.0.2 and everything is fine. (I'm very glad! I definitely
should update my backup right away!)

I will try 6.1-rc1 next, bear with...



More information about the dri-devel mailing list