[PATCH] fbdev: defio: fix the pagelist corruption
Liu, Chuansheng
chuansheng.liu at intel.com
Mon Mar 28 00:58:17 UTC 2022
Hi Paul,
> -----Original Message-----
> From: dri-devel <dri-devel-bounces at lists.freedesktop.org> On Behalf Of Paul
> Menzel
> Sent: Saturday, March 26, 2022 4:11 PM
> To: Liu, Chuansheng <chuansheng.liu at intel.com>
> Cc: linux-fbdev at vger.kernel.org; deller at gmx.de; dri-
> devel at lists.freedesktop.org; tzimmermann at suse.de; jayalk at intworks.biz
> Subject: Re: [PATCH] fbdev: defio: fix the pagelist corruption
>
> Dear Chuansheng,
>
>
> Am 17.03.22 um 06:46 schrieb Chuansheng Liu:
> > Easily hit the below list corruption:
> > ==
> > list_add corruption. prev->next should be next (ffffffffc0ceb090), but
> > was ffffec604507edc8. (prev=ffffec604507edc8).
> > WARNING: CPU: 65 PID: 3959 at lib/list_debug.c:26
> > __list_add_valid+0x53/0x80
> > CPU: 65 PID: 3959 Comm: fbdev Tainted: G U
> > RIP: 0010:__list_add_valid+0x53/0x80
> > Call Trace:
> > <TASK>
> > fb_deferred_io_mkwrite+0xea/0x150
> > do_page_mkwrite+0x57/0xc0
> > do_wp_page+0x278/0x2f0
> > __handle_mm_fault+0xdc2/0x1590
> > handle_mm_fault+0xdd/0x2c0
> > do_user_addr_fault+0x1d3/0x650
> > exc_page_fault+0x77/0x180
> > ? asm_exc_page_fault+0x8/0x30
> > asm_exc_page_fault+0x1e/0x30
> > RIP: 0033:0x7fd98fc8fad1
> > ==
> >
> > Figure out the race happens when one process is adding &page->lru into
> > the pagelist tail in fb_deferred_io_mkwrite(), another process is
> > re-initializing the same &page->lru in fb_deferred_io_fault(), which is
> > not protected by the lock.
> >
> > This fix is to init all the page lists one time during initialization,
> > it not only fixes the list corruption, but also avoids INIT_LIST_HEAD()
> > redundantly.
> >
> > Fixes: 105a940416fc ("fbdev/defio: Early-out if page is already
> > enlisted")
> > Cc: Thomas Zimmermann <tzimmermann at suse.de>
> > Signed-off-by: Chuansheng Liu <chuansheng.liu at intel.com>
> > ---
> > drivers/video/fbdev/core/fb_defio.c | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/video/fbdev/core/fb_defio.c
> b/drivers/video/fbdev/core/fb_defio.c
> > index 98b0f23bf5e2..eafb66ca4f28 100644
> > --- a/drivers/video/fbdev/core/fb_defio.c
> > +++ b/drivers/video/fbdev/core/fb_defio.c
> > @@ -59,7 +59,6 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault
> *vmf)
> > printk(KERN_ERR "no mapping available\n");
> >
> > BUG_ON(!page->mapping);
> > - INIT_LIST_HEAD(&page->lru);
> > page->index = vmf->pgoff;
> >
> > vmf->page = page;
> > @@ -220,6 +219,8 @@ static void fb_deferred_io_work(struct work_struct
> *work)
> > void fb_deferred_io_init(struct fb_info *info)
> > {
> > struct fb_deferred_io *fbdefio = info->fbdefio;
> > + struct page *page;
> > + int i;
> >
> > BUG_ON(!fbdefio);
> > mutex_init(&fbdefio->lock);
> > @@ -227,6 +228,12 @@ void fb_deferred_io_init(struct fb_info *info)
> > INIT_LIST_HEAD(&fbdefio->pagelist);
> > if (fbdefio->delay == 0) /* set a default of 1 s */
> > fbdefio->delay = HZ;
> > +
> > + /* initialize all the page lists one time */
> > + for (i = 0; i < info->fix.smem_len; i += PAGE_SIZE) {
> > + page = fb_deferred_io_page(info, i);
> > + INIT_LIST_HEAD(&page->lru);
> > + }
> > }
> > EXPORT_SYMBOL_GPL(fb_deferred_io_init);
> >
> Applying your patch on top of current Linus’ master branch, tty0 is
> unusable and looks frozen. Sometimes network card still works, sometimes
> not.
I don't see how the patch would cause below BUG call stack, need some time to
debug. Just few comments:
1. Will the system work well without this patch?
2. When you are sure the patch causes the regression you saw, please get free to submit
one reverted patch, thanks : )
>
> $ git log --oneline -nodecorate -2
> 1b351a77ed33 (HEAD -> linus) fbdev: defio: fix the pagelist corruption
> 52d543b5497c (origin/master, origin/HEAD) Merge tag
> 'for-linus-5.17-1' of https://github.com/cminyard/linux-ipmi
>
> ```
> [ 5.256996] raw: 0000000000000000 0000000000000000 00000000ffffffff
> 0000000000000000
> [ 5.269582] page dumped because: VM_BUG_ON_PAGE(compound &&
> compound_order(page) != order)
> [ 5.279507] ------------[ cut here ]------------
> [ 5.286406] kernel BUG at mm/page_alloc.c:1326!
> [ 5.291814] invalid opcode: 0000 [#1] PREEMPT SMP
> [ 5.296350] CPU: 0 PID: 167 Comm: systemd-udevd Not tainted
> 5.17.0-10753-g1b351a77ed33 #300
> [ 5.304670] Hardware name: ASUS F2A85-M_PRO/F2A85-M_PRO, BIOS
> 4.16-337-gb87986e67b 03/25/2022
> [ 5.313163] RIP: 0010:free_pcp_prepare+0x295/0x400
> [ 5.317930] Code: 00 01 00 75 0b 48 8b 45 08 45 31 ff a8 01 74 4b 48
> 8b 45 00 a9 00 00 01 00 75 22 48 c7 c6 68 30 11 96 48 89 ef e8 cb 29 fd
> ff <0f> 0b 48 89 ef 41 83 c6 01 e8 bd f5 ff ff e9 2e fe ff ff 0f 1f 44
> [ 5.336650] RSP: 0018:ffffa6634062f9c0 EFLAGS: 00010246
> [ 5.341849] RAX: 000000000000004e RBX: ffffe4be80000000 RCX:
> 0000000000000000
> [ 5.348957] RDX: 0000000000000000 RSI: ffffffff96136a37 RDI:
> 00000000ffffffff
> [ 5.356063] RBP: ffffe4be840c0000 R08: 0000000000000000 R09:
> 00000000ffffdfff
> [ 5.363170] R10: ffffa6634062f7f0 R11: ffffffff9652c4a8 R12:
> 0000000000000000
> [ 5.370277] R13: 0000000000000009 R14: ffff91fd02ebc640 R15:
> ffffe4be840c0000
> [ 5.377384] FS: 0000000000000000(0000) GS:ffff91fd7b400000(0063)
> knlGS:00000000f7eea800
> [ 5.385443] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
> [ 5.391164] CR2: 00000000f6f0e840 CR3: 0000000106b60000 CR4:
> 00000000000406f0
> [ 5.398272] Call Trace:
> [ 5.400697] <TASK>
> [ 5.402778] free_unref_page+0x1b/0xf0
> [ 5.406505] __vunmap+0x216/0x2c0
> [ 5.409798] drm_fbdev_cleanup+0x5f/0xb0
> [ 5.413698] drm_fbdev_fb_destroy+0x15/0x30
> [ 5.417857] unregister_framebuffer+0x2c/0x40
> [ 5.422191] drm_client_dev_unregister+0x69/0xe0
> [ 5.422962] usb usb4: New USB device found, idVendor=1d6b,
> idProduct=0003, bcdDevice= 5.17
> [ 5.426784] drm_dev_unregister+0x2e/0x80
> [ 5.439005] drm_dev_unplug+0x21/0x40
> [ 5.442645] simpledrm_remove+0x11/0x20
> [ 5.446458] platform_remove+0x1f/0x40
> [ 5.450185] __device_release_driver+0x17a/0x250
> [ 5.454779] device_release_driver+0x24/0x30
> [ 5.459024] bus_remove_device+0xd8/0x140
> [ 5.463012] device_del+0x18b/0x3f0
> [ 5.466478] ? idr_alloc_cyclic+0x50/0xb0
> [ 5.470466] platform_device_del.part.0+0x13/0x70
> [ 5.475146] platform_device_unregister+0x1c/0x30
> [ 5.479824] drm_aperture_detach_drivers+0xa1/0xd0
> [ 5.484593] drm_aperture_remove_conflicting_pci_framebuffers+0x3f/0x60
> [ 5.491179] radeon_pci_probe+0x54/0xf0 [radeon]
> [ 5.495773] local_pci_probe+0x45/0x80
> [ 5.499499] ? pci_match_device+0xd7/0x130
> [ 5.503572] pci_device_probe+0xc2/0x1e0
> [ 5.507474] really_probe+0x1f5/0x3d0
> [ 5.511112] __driver_probe_device+0xfe/0x180
> [ 5.515446] driver_probe_device+0x1e/0x90
> [ 5.519518] __driver_attach+0xc0/0x1c0
> [ 5.523332] ? __device_attach_driver+0xe0/0xe0
> [ 5.527839] ? __device_attach_driver+0xe0/0xe0
> [ 5.532346] bus_for_each_dev+0x78/0xc0
> [ 5.536159] bus_add_driver+0x149/0x1e0
> [ 5.539973] driver_register+0x8f/0xe0
> [ 5.543699] ? 0xffffffffc0741000
> [ 5.546992] do_one_initcall+0x44/0x200
> [ 5.550806] ? kmem_cache_alloc_trace+0x170/0x2c0
> [ 5.555487] do_init_module+0x4c/0x240
> [ 5.559213] __do_sys_finit_module+0xb4/0x120
> [ 5.563547] __do_fast_syscall_32+0x6b/0xe0
> [ 5.567706] do_fast_syscall_32+0x2f/0x70
> [ 5.571693] entry_SYSCALL_compat_after_hwframe+0x45/0x4d
> [ 5.577067] RIP: 0023:0xf7efa549
> [ 5.580273] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10
> 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 cd 0f 05 cd
> 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
> [ 5.582805] usb usb4: New USB device strings: Mfr=3, Product=2,
> SerialNumber=1
> [ 5.598992] RSP: 002b:00000000ff831c0c EFLAGS: 00200296 ORIG_RAX:
> 000000000000015e
> [ 5.598996] RAX: ffffffffffffffda RBX: 0000000000000011 RCX:
> 00000000f7ed9e09
> [ 5.598998] RDX: 0000000000000000 RSI: 0000000056a5c940 RDI:
> 0000000056a5c4c0
> [ 5.598999] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> [ 5.635047] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000000
> [ 5.642154] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> [ 5.649264] </TASK>
> [ 5.651427] Modules linked in: crct10dif_pclmul crc32_pclmul
> crc32c_intel ghash_clmulni_intel snd_hda_codec_realtek
> snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi radeon(+) r8169
> xhci_pci(+) realtek snd_hda_intel drm_ttm_helper snd_intel_dspcfg
> k10temp snd_hda_codec ttm snd_hda_core xhci_hcd snd_pcm sg ohci_hcd
> ehci_pci(+) snd_timer drm_dp_helper snd ehci_hcd soundcore i2c_piix4
> acpi_cpufreq coreboot_table fuse ipv6 autofs4
> [ 5.690975] r8169 0000:04:00.0 enp4s0: renamed from eth0
> [ 5.691589] ---[ end trace 0000000000000000 ]---
> [ 5.704791] RIP: 0010:free_pcp_prepare+0x295/0x400
> [ 5.709784] Code: 00 01 00 75 0b 48 8b 45 08 45 31 ff a8 01 74 4b 48
> 8b 45 00 a9 00 00 01 00 75 22 48 c7 c6 68 30 11 96 48 89 ef e8 cb 29 fd
> ff <0f> 0b 48 89 ef 41 83 c6 01 e8 bd f5 ff ff e9 2e fe ff ff 0f 1f 44
> [ 5.731535] RSP: 0018:ffffa6634062f9c0 EFLAGS: 00010246
> [ 5.752988] usb usb4: Product: xHCI Host Controller
> [ 5.758571] usb usb4: Manufacturer: Linux 5.17.0-10753-g1b351a77ed33
> xhci-hcd
> [ 5.767096] usb usb4: SerialNumber: 0000:03:00.0
> [ 5.772213] hub 4-0:1.0: USB hub found
> [ 5.782383] hub 4-0:1.0: 2 ports detected
> [ 5.799251] RAX: 000000000000004e RBX: ffffe4be80000000 RCX:
> 0000000000000000
> [ 5.810470] RDX: 0000000000000000 RSI: ffffffff96136a37 RDI:
> 00000000ffffffff
> [ 5.817561] RBP: ffffe4be840c0000 R08: 0000000000000000 R09:
> 00000000ffffdfff
> [ 5.824680] R10: ffffa6634062f7f0 R11: ffffffff9652c4a8 R12:
> 0000000000000000
> [ 5.831739] R13: 0000000000000009 R14: ffff91fd02ebc640 R15:
> ffffe4be840c0000
> [ 5.839445] FS: 0000000000000000(0000) GS:ffff91fd7b500000(0063)
> knlGS:00000000f7eea800
> [ 5.847905] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
> [ 5.854025] CR2: 000000005664d26c CR3: 0000000106b60000 CR4:
> 00000000000406e0
> ```
>
>
> Kind regards,
>
> Paul
>
>
> PS: For some reason, the lore.kernel.org lists most messages twice [1].
>
> PPS: I am actually wanted to analyze the new regression, and thought
> your patch might help, but made it worse. ;-) (The log excerpt is from
> Linux master.)
>
> ```
> [ 1.738965] BUG: Bad page state in process systemd-udevd pfn:103003
> [ 1.738974] fbcon: Taking over console
> [ 1.740459] page:00000000c3b5c591 refcount:0 mapcount:0
> mapping:0000000
> 000000000 index:0x3 pfn:0x103003
> [ 1.740466] head:000000009b49a8e9 order:9 compound_mapcount:0
> compound_
> pincount:0
> [ 1.740468] flags: 0x2fffc000010000(head|node=0|zone=2|lastcpupid=0x3ff
> f)
> [ 1.740473] raw: 002fffc000000000 fffff139840c0001 fffff139840c00c8 000
> 0000000000000
> [ 1.740475] raw: 0000000000000000 0000000000000000 00000000ffffffff 000
> 0000000000000
> [ 1.740477] head: 002fffc000010000 0000000000000000 dead000000000122
> 00
> 00000000000000
> [ 1.740479] head: 0000000000000000 0000000000000000 00000000ffffffff 00
> 00000000000000
> [ 1.740480] page dumped because: corrupted mapping in tail page
> ```
>
> I am going to do that in another thread.
>
> [1]:
> https://lore.kernel.org/all/20220317054602.28846-1-
> chuansheng.liu at intel.com/
More information about the dri-devel
mailing list