[PATCH] fbdev: defio: fix the pagelist corruption

Paul Menzel pmenzel at molgen.mpg.de
Mon Mar 28 06:15:14 UTC 2022


Dear Chuansheng,


Am 28.03.22 um 02:58 schrieb Liu, Chuansheng:

>> -----Original Message-----

>> Sent: Saturday, March 26, 2022 4:11 PM

>> Am 17.03.22 um 06:46 schrieb Chuansheng Liu:
>>> Easily hit the below list corruption:
>>> ==
>>> list_add corruption. prev->next should be next (ffffffffc0ceb090), but
>>> was ffffec604507edc8. (prev=ffffec604507edc8).
>>> WARNING: CPU: 65 PID: 3959 at lib/list_debug.c:26
>>> __list_add_valid+0x53/0x80
>>> CPU: 65 PID: 3959 Comm: fbdev Tainted: G     U
>>> RIP: 0010:__list_add_valid+0x53/0x80
>>> Call Trace:
>>>    <TASK>
>>>    fb_deferred_io_mkwrite+0xea/0x150
>>>    do_page_mkwrite+0x57/0xc0
>>>    do_wp_page+0x278/0x2f0
>>>    __handle_mm_fault+0xdc2/0x1590
>>>    handle_mm_fault+0xdd/0x2c0
>>>    do_user_addr_fault+0x1d3/0x650
>>>    exc_page_fault+0x77/0x180
>>>    ? asm_exc_page_fault+0x8/0x30
>>>    asm_exc_page_fault+0x1e/0x30
>>> RIP: 0033:0x7fd98fc8fad1
>>> ==
>>>
>>> Figure out the race happens when one process is adding &page->lru into
>>> the pagelist tail in fb_deferred_io_mkwrite(), another process is
>>> re-initializing the same &page->lru in fb_deferred_io_fault(), which is
>>> not protected by the lock.
>>>
>>> This fix is to init all the page lists one time during initialization,
>>> it not only fixes the list corruption, but also avoids INIT_LIST_HEAD()
>>> redundantly.
>>>
>>> Fixes: 105a940416fc ("fbdev/defio: Early-out if page is already enlisted")
>>> Cc: Thomas Zimmermann <tzimmermann at suse.de>
>>> Signed-off-by: Chuansheng Liu <chuansheng.liu at intel.com>
>>> ---
>>>    drivers/video/fbdev/core/fb_defio.c | 9 ++++++++-
>>>    1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
>>> index 98b0f23bf5e2..eafb66ca4f28 100644
>>> --- a/drivers/video/fbdev/core/fb_defio.c
>>> +++ b/drivers/video/fbdev/core/fb_defio.c
>>> @@ -59,7 +59,6 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault *vmf)
>>>    		printk(KERN_ERR "no mapping available\n");
>>>
>>>    	BUG_ON(!page->mapping);
>>> -	INIT_LIST_HEAD(&page->lru);
>>>    	page->index = vmf->pgoff;
>>>
>>>    	vmf->page = page;
>>> @@ -220,6 +219,8 @@ static void fb_deferred_io_work(struct work_struct *work)
>>>    void fb_deferred_io_init(struct fb_info *info)
>>>    {
>>>    	struct fb_deferred_io *fbdefio = info->fbdefio;
>>> +	struct page *page;
>>> +	int i;
>>>
>>>    	BUG_ON(!fbdefio);
>>>    	mutex_init(&fbdefio->lock);
>>> @@ -227,6 +228,12 @@ void fb_deferred_io_init(struct fb_info *info)
>>>    	INIT_LIST_HEAD(&fbdefio->pagelist);
>>>    	if (fbdefio->delay == 0) /* set a default of 1 s */
>>>    		fbdefio->delay = HZ;
>>> +
>>> +	/* initialize all the page lists one time */
>>> +	for (i = 0; i < info->fix.smem_len; i += PAGE_SIZE) {
>>> +		page = fb_deferred_io_page(info, i);
>>> +		INIT_LIST_HEAD(&page->lru);
>>> +	}
>>>    }
>>>    EXPORT_SYMBOL_GPL(fb_deferred_io_init);
>>>
>> Applying your patch on top of current Linus’ master branch, tty0 is
>> unusable and looks frozen. Sometimes network card still works, sometimes
>> not.
> 
> I don't see how the patch would cause below BUG call stack, need some time to
> debug. Just few comments:
> 1. Will the system work well without this patch?

Yes, the framebuffer works well without the patch.

> 2. When you are sure the patch causes the regression you saw, please get free to submit
> one reverted patch, thanks : )

I think you for patch wasn’t submitted yet – at least not pulled by Linus.

>>       $ git log --oneline -nodecorate -2
>>       1b351a77ed33 (HEAD -> linus) fbdev: defio: fix the pagelist corruption
>>       52d543b5497c (origin/master, origin/HEAD) Merge tag 'for-linus-5.17-1' of https://github.com/cminyard/linux-ipmi
>>
>> ```
>> [    5.256996] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
>> [    5.269582] page dumped because: VM_BUG_ON_PAGE(compound && compound_order(page) != order)
>> [    5.279507] ------------[ cut here ]------------
>> [    5.286406] kernel BUG at mm/page_alloc.c:1326!
>> [    5.291814] invalid opcode: 0000 [#1] PREEMPT SMP
>> [    5.296350] CPU: 0 PID: 167 Comm: systemd-udevd Not tainted 5.17.0-10753-g1b351a77ed33 #300
>> [    5.304670] Hardware name: ASUS F2A85-M_PRO/F2A85-M_PRO, BIOS 4.16-337-gb87986e67b 03/25/2022
>> [    5.313163] RIP: 0010:free_pcp_prepare+0x295/0x400
>> [    5.317930] Code: 00 01 00 75 0b 48 8b 45 08 45 31 ff a8 01 74 4b 48 8b 45 00 a9 00 00 01 00 75 22 48 c7 c6 68 30 11 96 48 89 ef e8 cb 29 fd ff <0f> 0b 48 89 ef 41 83 c6 01 e8 bd f5 ff ff e9 2e fe ff ff 0f 1f 44
>> [    5.336650] RSP: 0018:ffffa6634062f9c0 EFLAGS: 00010246
>> [    5.341849] RAX: 000000000000004e RBX: ffffe4be80000000 RCX: 0000000000000000
>> [    5.348957] RDX: 0000000000000000 RSI: ffffffff96136a37 RDI: 00000000ffffffff
>> [    5.356063] RBP: ffffe4be840c0000 R08: 0000000000000000 R09: 00000000ffffdfff
>> [    5.363170] R10: ffffa6634062f7f0 R11: ffffffff9652c4a8 R12: 0000000000000000
>> [    5.370277] R13: 0000000000000009 R14: ffff91fd02ebc640 R15: ffffe4be840c0000
>> [    5.377384] FS:  0000000000000000(0000) GS:ffff91fd7b400000(0063) knlGS:00000000f7eea800
>> [    5.385443] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
>> [    5.391164] CR2: 00000000f6f0e840 CR3: 0000000106b60000 CR4: 00000000000406f0
>> [    5.398272] Call Trace:
>> [    5.400697]  <TASK>
>> [    5.402778]  free_unref_page+0x1b/0xf0
>> [    5.406505]  __vunmap+0x216/0x2c0
>> [    5.409798]  drm_fbdev_cleanup+0x5f/0xb0
>> [    5.413698]  drm_fbdev_fb_destroy+0x15/0x30
>> [    5.417857]  unregister_framebuffer+0x2c/0x40
>> [    5.422191]  drm_client_dev_unregister+0x69/0xe0
>> [    5.422962] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.17
>> [    5.426784]  drm_dev_unregister+0x2e/0x80
>> [    5.439005]  drm_dev_unplug+0x21/0x40
>> [    5.442645]  simpledrm_remove+0x11/0x20
>> [    5.446458]  platform_remove+0x1f/0x40
>> [    5.450185]  __device_release_driver+0x17a/0x250
>> [    5.454779]  device_release_driver+0x24/0x30
>> [    5.459024]  bus_remove_device+0xd8/0x140
>> [    5.463012]  device_del+0x18b/0x3f0
>> [    5.466478]  ? idr_alloc_cyclic+0x50/0xb0
>> [    5.470466]  platform_device_del.part.0+0x13/0x70
>> [    5.475146]  platform_device_unregister+0x1c/0x30
>> [    5.479824]  drm_aperture_detach_drivers+0xa1/0xd0
>> [    5.484593]  drm_aperture_remove_conflicting_pci_framebuffers+0x3f/0x60
>> [    5.491179]  radeon_pci_probe+0x54/0xf0 [radeon]
>> [    5.495773]  local_pci_probe+0x45/0x80
>> [    5.499499]  ? pci_match_device+0xd7/0x130
>> [    5.503572]  pci_device_probe+0xc2/0x1e0
>> [    5.507474]  really_probe+0x1f5/0x3d0
>> [    5.511112]  __driver_probe_device+0xfe/0x180
>> [    5.515446]  driver_probe_device+0x1e/0x90
>> [    5.519518]  __driver_attach+0xc0/0x1c0
>> [    5.523332]  ? __device_attach_driver+0xe0/0xe0
>> [    5.527839]  ? __device_attach_driver+0xe0/0xe0
>> [    5.532346]  bus_for_each_dev+0x78/0xc0
>> [    5.536159]  bus_add_driver+0x149/0x1e0
>> [    5.539973]  driver_register+0x8f/0xe0
>> [    5.543699]  ? 0xffffffffc0741000
>> [    5.546992]  do_one_initcall+0x44/0x200
>> [    5.550806]  ? kmem_cache_alloc_trace+0x170/0x2c0
>> [    5.555487]  do_init_module+0x4c/0x240
>> [    5.559213]  __do_sys_finit_module+0xb4/0x120
>> [    5.563547]  __do_fast_syscall_32+0x6b/0xe0
>> [    5.567706]  do_fast_syscall_32+0x2f/0x70
>> [    5.571693]  entry_SYSCALL_compat_after_hwframe+0x45/0x4d
>> [    5.577067] RIP: 0023:0xf7efa549
>> [    5.580273] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 cd 0f 05 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
>> [    5.582805] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
>> [    5.598992] RSP: 002b:00000000ff831c0c EFLAGS: 00200296 ORIG_RAX: 000000000000015e
>> [    5.598996] RAX: ffffffffffffffda RBX: 0000000000000011 RCX: 00000000f7ed9e09
>> [    5.598998] RDX: 0000000000000000 RSI: 0000000056a5c940 RDI: 0000000056a5c4c0
>> [    5.598999] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>> [    5.635047] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>> [    5.642154] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>> [    5.649264]  </TASK>
>> [    5.651427] Modules linked in: crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi radeon(+) r8169 xhci_pci(+) realtek snd_hda_intel drm_ttm_helper snd_intel_dspcfg k10temp snd_hda_codec ttm snd_hda_core xhci_hcd snd_pcm sg ohci_hcd ehci_pci(+) snd_timer drm_dp_helper snd ehci_hcd soundcore i2c_piix4 acpi_cpufreq coreboot_table fuse ipv6 autofs4
>> [    5.690975] r8169 0000:04:00.0 enp4s0: renamed from eth0
>> [    5.691589] ---[ end trace 0000000000000000 ]---
>> [    5.704791] RIP: 0010:free_pcp_prepare+0x295/0x400
>> [    5.709784] Code: 00 01 00 75 0b 48 8b 45 08 45 31 ff a8 01 74 4b 48 8b 45 00 a9 00 00 01 00 75 22 48 c7 c6 68 30 11 96 48 89 ef e8 cb 29 fd ff <0f> 0b 48 89 ef 41 83 c6 01 e8 bd f5 ff ff e9 2e fe ff ff 0f 1f 44
>> [    5.731535] RSP: 0018:ffffa6634062f9c0 EFLAGS: 00010246
>> [    5.752988] usb usb4: Product: xHCI Host Controller
>> [    5.758571] usb usb4: Manufacturer: Linux 5.17.0-10753-g1b351a77ed33 xhci-hcd
>> [    5.767096] usb usb4: SerialNumber: 0000:03:00.0
>> [    5.772213] hub 4-0:1.0: USB hub found
>> [    5.782383] hub 4-0:1.0: 2 ports detected
>> [    5.799251] RAX: 000000000000004e RBX: ffffe4be80000000 RCX: 0000000000000000
>> [    5.810470] RDX: 0000000000000000 RSI: ffffffff96136a37 RDI: 00000000ffffffff
>> [    5.817561] RBP: ffffe4be840c0000 R08: 0000000000000000 R09: 00000000ffffdfff
>> [    5.824680] R10: ffffa6634062f7f0 R11: ffffffff9652c4a8 R12: 0000000000000000
>> [    5.831739] R13: 0000000000000009 R14: ffff91fd02ebc640 R15: ffffe4be840c0000
>> [    5.839445] FS:  0000000000000000(0000) GS:ffff91fd7b500000(0063) knlGS:00000000f7eea800
>> [    5.847905] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
>> [    5.854025] CR2: 000000005664d26c CR3: 0000000106b60000 CR4: 00000000000406e0
>> ```

>> PS: For some reason, the lore.kernel.org lists most messages twice [1].
>>
>> PPS: I am actually wanted to analyze the new regression, and thought
>> your patch might help, but made it worse. ;-) (The log excerpt is from
>> Linux master.)
>>
>> ```
>> [    1.738965] BUG: Bad page state in process systemd-udevd  pfn:103003
>> [    1.738974] fbcon: Taking over console
>> [    1.740459] page:00000000c3b5c591 refcount:0 mapcount:0 mapping:0000000 000000000 index:0x3 pfn:0x103003
>> [    1.740466] head:000000009b49a8e9 order:9 compound_mapcount:0 compound_pincount:0
>> [    1.740468] flags: 0x2fffc000010000(head|node=0|zone=2|lastcpupid=0x3ff f)
>> [    1.740473] raw: 002fffc000000000 fffff139840c0001 fffff139840c00c8 000 0000000000000
>> [    1.740475] raw: 0000000000000000 0000000000000000 00000000ffffffff 000 0000000000000
>> [    1.740477] head: 002fffc000010000 0000000000000000 dead000000000122 00 00000000000000
>> [    1.740479] head: 0000000000000000 0000000000000000 00000000ffffffff 00 00000000000000
>> [    1.740480] page dumped because: corrupted mapping in tail page
>> ```
>>
>> I am going to do that in another thread.

This is [2].


Kind regards,

Paul


>> [1]: https://lore.kernel.org/all/20220317054602.28846-1-chuansheng.liu@intel.com/
[2]: 
https://lore.kernel.org/bpf/7edcd673-decf-7b4e-1f6e-f2e0e26f757a@molgen.mpg.de/


More information about the dri-devel mailing list