[Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"
Thomas Zimmermann
tzimmermann at suse.de
Mon Oct 24 10:26:43 UTC 2022
Hi
Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
> Hi, this is your Linux kernel regression tracker speaking.
>
> I noticed a regression report in bugzilla.kernel.org. As many (most?)
> kernel developer don't keep an eye on it, I decided to forward it by
> mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616 :
>
>> Andreas 2022-10-22 14:25:32 UTC
>>
>> Created attachment 303074 [details]
>> dmesg
I've looked at the kernel log and found that simpledrm has been loaded
*after* amdgpu, which should never happen. The problematic patch has
been taken from a long list of refactoring work on this code. No wonder
that it doesn't work as expected.
Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove
remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and
report on the results. It should fix the problem.
Best regards
Thomas
>>
>> 6.0.2 works.
>>
>> On 6.0.3 the system is very sluggish with graphic glitches all over the place in KDE Plasma Desktop X11 (no graphic glitches when using Wayland, but also sluggish). SDDM works fine.
>>
>> Hardware: Lenovo Legion 5 Pro 16ACH6H: AMD Ryzen 7 5800H "Cezanne", hybrid graphics AMD "Green Sardine" (Vega 8 GCN 5.1, AMDGPU) and Nvidia GeForce RTX 3070 Mobile (GA104M, not working with nouveau, I'm not using the proprietary nvidia driver).
>>
>> [reply] [−] Comment 1 Andreas 2022-10-22 14:27:15 UTC
>>
>> Created attachment 303075 [details]
>> my kernel .config for 6.0.3
>>
>> Only was CONFIG_HID_TOPRE added in 6.0.3, otherwise it is identical as my .config for 6.0.2.
>>
>> [reply] [−] Comment 2 Andreas 2022-10-22 14:51:23 UTC
>>
>> In /var/log/Xorg.0.log the only obvious difference is the last line:
>> ---- snap
>> randr: falling back to unsynchronized pixmap sharing
>> ---- snap
>> The line is present when I boot with 6.0.3, but isn't when I boot 6.0.2.
>>
>> (Obviously this is when I login to KDE with X11, not with Wayland, from SDDM.)
>>
>> [reply] [−] Comment 3 Andreas 2022-10-22 22:10:19 UTC
>>
>> I did a git bisect on stable kernels 5.0.3 as bad and 5.0.2 as good, this is the result:
>>
>> cfecfc98a78d97a49807531b5b224459bda877de is the first bad commit
>> commit cfecfc98a78d97a49807531b5b224459bda877de (HEAD, refs/bisect/bad)
>> Author: Thomas Zimmermann <tzimmermann at suse.de>
>> Date: Mon Jul 18 09:23:18 2022 +0200
>>
>> video/aperture: Disable and unregister sysfb devices via aperture helpers
>>
>> [ Upstream commit 5e01376124309b4dbd30d413f43c0d9c2f60edea ]
>>
>> Call sysfb_disable() before removing conflicting devices in aperture
>> helpers. Fixes sysfb state if fbdev has been disabled.
>>
>> Signed-off-by: Thomas Zimmermann <tzimmermann at suse.de>
>> Reviewed-by: Javier Martinez Canillas <javierm at redhat.com>
>> Fixes: fb84efa28a48 ("drm/aperture: Run fbdev removal before internal helpers")
>>
>> [reply] [−] Comment 4 Andreas 2022-10-22 22:11:51 UTC
>>
>> Link to the suspect patch:
>>
>> https://patchwork.freedesktop.org/patch/msgid/20220718072322.8927-8-tzimmermann@suse.de
>> (or https://patchwork.freedesktop.org/patch/494608/)
>>
>> [reply] [−] Comment 5 Andreas 2022-10-22 22:38:14 UTC
>>
>> Okay, so I reverted v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch on stable 5.0.3 and the fault is gone.
>>
>> I always logged out immediately, which worked (even though everything is very very sluggish). Also, when I killed the X session within a couple of seconds (15 or so), no error was shown (I used "systemctl stop sddm" from another virtual console).
>>
>> Noteworthy: I once compiled a kernel from within the Plasma Desktop, while it was sluggish. The kernel compiled alright. When it was finished I moved the mouse to reboot, at which point it completely froze and I had to hard-reset the system.
>>
>> While still running, after > 15 seconds, the fault looked like this (dmesg):
>> ---- snap ----
>> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 7 jiffies s: 165 root: 0x2000/.
>> rcu: blocking rcu_node structures (internal RCU debug):
>> Task dump for CPU 13:
>> task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x00000008
>> Call Trace:
>> <TASK>
>> ? commit_tail+0xd7/0x130
>> ? drm_atomic_helper_commit+0x126/0x150
>> ? drm_atomic_commit+0xa4/0xe0
>> ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>> ? drm_atomic_helper_dirtyfb+0x19e/0x280
>> ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>> ? drm_ioctl_kernel+0xc4/0x150
>> ? drm_ioctl+0x246/0x3f0
>> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>> ? __x64_sys_ioctl+0x91/0xd0
>> ? do_syscall_64+0x60/0xd0
>> ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>> </TASK>
>> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 29 jiffies s: 165 root: 0x2000/.
>> rcu: blocking rcu_node structures (internal RCU debug):
>> Task dump for CPU 13:
>> task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x00000008
>> Call Trace:
>> <TASK>
>> ? commit_tail+0xd7/0x130
>> ? drm_atomic_helper_commit+0x126/0x150
>> ? drm_atomic_commit+0xa4/0xe0
>> ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>> ? drm_atomic_helper_dirtyfb+0x19e/0x280
>> ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>> ? drm_ioctl_kernel+0xc4/0x150
>> ? drm_ioctl+0x246/0x3f0
>> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>> ? __x64_sys_ioctl+0x91/0xd0
>> ? do_syscall_64+0x60/0xd0
>> ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>> </TASK>
>> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 8 jiffies s: 169 root: 0x2000/.
>> rcu: blocking rcu_node structures (internal RCU debug):
>> Task dump for CPU 13:
>> task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x0000400e
>> Call Trace:
>> <TASK>
>> ? memcpy_toio+0x76/0xc0
>> ? drm_fb_memcpy_toio+0x76/0xb0
>> ? drm_fb_blit_toio+0x75/0x2b0
>> ? simpledrm_simple_display_pipe_update+0x132/0x150
>> ? drm_atomic_helper_commit_planes+0xb6/0x230
>> ? drm_atomic_helper_commit_tail+0x44/0x80
>> ? commit_tail+0xd7/0x130
>> ? drm_atomic_helper_commit+0x126/0x150
>> ? drm_atomic_commit+0xa4/0xe0
>> ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>> ? drm_atomic_helper_dirtyfb+0x19e/0x280
>> ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>> ? drm_ioctl_kernel+0xc4/0x150
>> ? drm_ioctl+0x246/0x3f0
>> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>> ? __x64_sys_ioctl+0x91/0xd0
>> ? do_syscall_64+0x60/0xd0
>> ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>> </TASK>
>> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 30 jiffies s: 169 root: 0x2000/.
>> rcu: blocking rcu_node structures (internal RCU debug):
>> Task dump for CPU 13:
>> task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x0000400e
>> Call Trace:
>> <TASK>
>> ? memcpy_toio+0x76/0xc0
>> ? memcpy_toio+0x1b/0xc0
>> ? drm_fb_memcpy_toio+0x76/0xb0
>> ? drm_fb_blit_toio+0x75/0x2b0
>> ? simpledrm_simple_display_pipe_update+0x132/0x150
>> ? drm_atomic_helper_commit_planes+0xb6/0x230
>> ? drm_atomic_helper_commit_tail+0x44/0x80
>> ? commit_tail+0xd7/0x130
>> ? drm_atomic_helper_commit+0x126/0x150
>> ? drm_atomic_commit+0xa4/0xe0
>> ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>> ? drm_atomic_helper_dirtyfb+0x19e/0x280
>> ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>> ? drm_ioctl_kernel+0xc4/0x150
>> ? drm_ioctl+0x246/0x3f0
>> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>> ? __x64_sys_ioctl+0x91/0xd0
>> ? do_syscall_64+0x60/0xd0
>> ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>> </TASK>
>> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 52 jiffies s: 169 root: 0x2000/.
>> rcu: blocking rcu_node structures (internal RCU debug):
>> Task dump for CPU 13:
>> task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x0000400e
>> Call Trace:
>> <TASK>
>> ? memcpy_toio+0x76/0xc0
>> ? memcpy_toio+0x1b/0xc0
>> ? drm_fb_memcpy_toio+0x76/0xb0
>> ? drm_fb_blit_toio+0x75/0x2b0
>> ? simpledrm_simple_display_pipe_update+0x132/0x150
>> ? drm_atomic_helper_commit_planes+0xb6/0x230
>> ? drm_atomic_helper_commit_tail+0x44/0x80
>> ? commit_tail+0xd7/0x130
>> ? drm_atomic_helper_commit+0x126/0x150
>> ? drm_atomic_commit+0xa4/0xe0
>> ? drm_plane_get_damage_clips.cold+0x1c/0x1c
>> ? drm_atomic_helper_dirtyfb+0x19e/0x280
>> ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
>> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>> ? drm_ioctl_kernel+0xc4/0x150
>> ? drm_ioctl+0x246/0x3f0
>> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
>> ? __x64_sys_ioctl+0x91/0xd0
>> ? do_syscall_64+0x60/0xd0
>> ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
>> </TASK>
>> traps: avahi-ml[4447] general protection fault ip:7fdde6a37bc1 sp:7fdde07fc920 error:0 in module-zeroconf-publish.so[7fdde6a37000+3000]
>>
>
> See the ticket for more details.
>
> BTW, let me use this mail to also add the report to the list of tracked
> regressions to ensure it's doesn't fall through the cracks:
>
> #regzbot introduced: cfecfc98a78d9
> https://bugzilla.kernel.org/show_bug.cgi?id=216616
> #regzbot ignore-activity
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>
> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> reports and sometimes miss something important when writing mails like
> this. If that's the case here, don't hesitate to tell me in a public
> reply, it's in everyone's interest to set the public record straight.
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20221024/964338b0/attachment-0001.sig>
More information about the dri-devel
mailing list