[Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"

Andreas Thalhammer andreas.thalhammer-linux at gmx.net
Tue Oct 25 08:45:42 UTC 2022


Am 25.10.22 um 10:16 schrieb Thomas Zimmermann:
> Hi Andreas
>
> Am 24.10.22 um 18:19 schrieb Andreas Thalhammer:
>> Am 24.10.22 um 13:31 schrieb Thomas Zimmermann:
>>> Hi
>>>
>>> Am 24.10.22 um 13:27 schrieb Greg KH:
>>>> On Mon, Oct 24, 2022 at 12:41:43PM +0200, Thorsten Leemhuis wrote:
>>>>> Hi! Thx for the reply.
>>>>>
>>>>> On 24.10.22 12:26, Thomas Zimmermann wrote:
>>>>>> Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
>>>>>>>
>>>>>>> I noticed a regression report in bugzilla.kernel.org. As many
>>>>>>> (most?)
>>>>>>> kernel developer don't keep an eye on it, I decided to forward it by
>>>>>>> mail. Quoting from
>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=216616  :
>>>>>>>
>>>>>>>>    Andreas 2022-10-22 14:25:32 UTC
>>>>>>>>
>>>>>>>> Created attachment 303074 [details]
>>>>>>>> dmesg
>>>>>>
>>>>>> I've looked at the kernel log and found that simpledrm has been
>>>>>> loaded
>>>>>> *after* amdgpu, which should never happen. The problematic patch has
>>>>>> been taken from a long list of refactoring work on this code. No
>>>>>> wonder
>>>>>> that it doesn't work as expected.
>>>>>>
>>>>>> Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove
>>>>>> remove_conflicting_pci_framebuffers()") into the 6.0 stable branch
>>>>>> and
>>>>>> report on the results. It should fix the problem.
>>>>>
>>>>> Greg, is that enough for you to pick this up? Or do you want
>>>>> Andreas to
>>>>> test first if it really fixes the reported problem?
>>>>
>>>> This should be good enough.  If this does NOT fix the issue, please let
>>>> me know.
>>>
>>> Thanks a lot. I think I can provided a dedicated fix if the proposed
>>> commit doesn't work.
>>>
>>> Best regards
>>> Thomas
>>>
>>>>
>>>> thanks,
>>>>
>>>> greg k-h
>>>
>>
>> Thanks... In short: the additional patch did NOT fix the problem.
>
> Yeah, it's also part of a larger changeset. But I wouldn't want to
> backport all those changes either.
>
> Attached is a simple patch for linux-stable that adds the necessary fix.
> If this still doesn't work, we should probably revert the problematic
> patch.
>
> Please test the patch and let me know if it works.


Yes, this fixed the problem. I'm running 6.0.3 with your patch now, all
fine.

Thanks!
Andreas

>
> Best regards
> Thomas
>
>>
>> I don't use git and I don't know how to /cherry-pick commit/
>> 9d69ef183815, but I found the patch here:
>> https://patchwork.freedesktop.org/patch/494609/
>>
>> I hope that's the right one. I reintegrated
>> v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch
>> and also applied
>> v2-04-11-fbdev-core-Remove-remove_conflicting_pci_framebuffers.patch,
>> did a "make mrproper" and thereafter compiled a clean new 6.0.3 kernel
>> (same .config).
>>
>> Now the system doesn't even boot to a console. The first boot got me to
>> a rcu_shed stall on CPUs/tasks, same as above, but this time with:
>> Workqueue: btrfs-cache btrfs_work_helper
>>
>> I booted a second time with the same kernel, and it got stuck after
>> mounting the root btrfs filesystem (what looked like a total freeze, but
>> when it didn't show a rcu_stall message after ~2 min I got impatient and
>> wanted to see if I had just busted my root filesystem...)
>>
>> I booted 6.0.2 and everything is fine. (I'm very glad! I definitely
>> should update my backup right away!)
>>
>> I will try 6.1-rc1 next, bear with...
>>
>



More information about the dri-devel mailing list