[PATCH] fbdev: defio: fix the pagelist corruption
Liu, Chuansheng
chuansheng.liu at intel.com
Thu Mar 31 00:06:14 UTC 2022
Hi Paul,
> -----Original Message-----
> From: Paul Menzel <pmenzel at molgen.mpg.de>
> Sent: Thursday, March 31, 2022 12:47 AM
> To: Liu, Chuansheng <chuansheng.liu at intel.com>
> Cc: tzimmermann at suse.de; linux-fbdev at vger.kernel.org; deller at gmx.de; dri-
> devel at lists.freedesktop.org
> Subject: Re: [PATCH] fbdev: defio: fix the pagelist corruption
>
> [Cc: -jayalk at intworks.biz as it bounces]
>
> Dear Chuansheng,
>
>
> Am 29.03.22 um 01:58 schrieb Liu, Chuansheng:
>
> >> -----Original Message-----
> >> From: Paul Menzel
> >> Sent: Monday, March 28, 2022 2:15 PM
>
> >> Am 28.03.22 um 02:58 schrieb Liu, Chuansheng:
> >>
> >>>> -----Original Message-----
> >>
> >>>> Sent: Saturday, March 26, 2022 4:11 PM
> >>
> >>>> Am 17.03.22 um 06:46 schrieb Chuansheng Liu:
> >>>>> Easily hit the below list corruption:
> >>>>> ==
> >>>>> list_add corruption. prev->next should be next (ffffffffc0ceb090), but
> >>>>> was ffffec604507edc8. (prev=ffffec604507edc8).
> >>>>> WARNING: CPU: 65 PID: 3959 at lib/list_debug.c:26
> >>>>> __list_add_valid+0x53/0x80
> >>>>> CPU: 65 PID: 3959 Comm: fbdev Tainted: G U
> >>>>> RIP: 0010:__list_add_valid+0x53/0x80
> >>>>> Call Trace:
> >>>>> <TASK>
> >>>>> fb_deferred_io_mkwrite+0xea/0x150
> >>>>> do_page_mkwrite+0x57/0xc0
> >>>>> do_wp_page+0x278/0x2f0
> >>>>> __handle_mm_fault+0xdc2/0x1590
> >>>>> handle_mm_fault+0xdd/0x2c0
> >>>>> do_user_addr_fault+0x1d3/0x650
> >>>>> exc_page_fault+0x77/0x180
> >>>>> ? asm_exc_page_fault+0x8/0x30
> >>>>> asm_exc_page_fault+0x1e/0x30
> >>>>> RIP: 0033:0x7fd98fc8fad1
> >>>>> ==
> >>>>>
> >>>>> Figure out the race happens when one process is adding &page->lru into
> >>>>> the pagelist tail in fb_deferred_io_mkwrite(), another process is
> >>>>> re-initializing the same &page->lru in fb_deferred_io_fault(), which is
> >>>>> not protected by the lock.
> >>>>>
> >>>>> This fix is to init all the page lists one time during initialization,
> >>>>> it not only fixes the list corruption, but also avoids INIT_LIST_HEAD()
> >>>>> redundantly.
> >>>>>
> >>>>> Fixes: 105a940416fc ("fbdev/defio: Early-out if page is already enlisted")
> >>>>> Cc: Thomas Zimmermann <tzimmermann at suse.de>
> >>>>> Signed-off-by: Chuansheng Liu <chuansheng.liu at intel.com>
> >>>>> ---
> >>>>> drivers/video/fbdev/core/fb_defio.c | 9 ++++++++-
> >>>>> 1 file changed, 8 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/drivers/video/fbdev/core/fb_defio.c
> >> b/drivers/video/fbdev/core/fb_defio.c
> >>>>> index 98b0f23bf5e2..eafb66ca4f28 100644
> >>>>> --- a/drivers/video/fbdev/core/fb_defio.c
> >>>>> +++ b/drivers/video/fbdev/core/fb_defio.c
> >>>>> @@ -59,7 +59,6 @@ static vm_fault_t fb_deferred_io_fault(struct
> vm_fault *vmf)
> >>>>> printk(KERN_ERR "no mapping available\n");
> >>>>>
> >>>>> BUG_ON(!page->mapping);
> >>>>> - INIT_LIST_HEAD(&page->lru);
> >>>>> page->index = vmf->pgoff;
> >>>>>
> >>>>> vmf->page = page;
> >>>>> @@ -220,6 +219,8 @@ static void fb_deferred_io_work(struct
> work_struct *work)
> >>>>> void fb_deferred_io_init(struct fb_info *info)
> >>>>> {
> >>>>> struct fb_deferred_io *fbdefio = info->fbdefio;
> >>>>> + struct page *page;
> >>>>> + int i;
> >>>>>
> >>>>> BUG_ON(!fbdefio);
> >>>>> mutex_init(&fbdefio->lock);
> >>>>> @@ -227,6 +228,12 @@ void fb_deferred_io_init(struct fb_info *info)
> >>>>> INIT_LIST_HEAD(&fbdefio->pagelist);
> >>>>> if (fbdefio->delay == 0) /* set a default of 1 s */
> >>>>> fbdefio->delay = HZ;
> >>>>> +
> >>>>> + /* initialize all the page lists one time */
> >>>>> + for (i = 0; i < info->fix.smem_len; i += PAGE_SIZE) {
> >>>>> + page = fb_deferred_io_page(info, i);
> >>>>> + INIT_LIST_HEAD(&page->lru);
> >>>>> + }
> >>>>> }
> >>>>> EXPORT_SYMBOL_GPL(fb_deferred_io_init);
> >>>>>
> >>>> Applying your patch on top of current Linus’ master branch, tty0 is
> >>>> unusable and looks frozen. Sometimes network card still works, sometimes
> >>>> not.
> >>>
> >>> I don't see how the patch would cause below BUG call stack, need some
> time to
> >>> debug. Just few comments:
> >>> 1. Will the system work well without this patch?
> >>
> >> Yes, the framebuffer works well without the patch.
> >>
> >>> 2. When you are sure the patch causes the regression you saw, please get
> free
> >> to submit one reverted patch, thanks : )
> >>
> >> I think you for patch wasn’t submitted yet – at least not pulled by Linus.
> > The patch has been in drm-tip, could you have a try with the latest drm-tip to
> see if the
> > Framebuffer works well, in that case, we could revert it in drm-tip then.
>
> With drm-tip (drm-tip: 2022y-03m-29d-13h-14m-35s UTC integration
> manifest) everything works fine. (I had to disable amdgpu driver, as it
> failed to build.) Is anyone able to explain that?
My patch is for fixing another patch which is in the drm-tip at least, so I assume
applying my patch into Linus tree directly is not completely proper.
That's my intention of asking your help for retesting drm-tip.
You mean everything working fine means another issue you hit is also gone?
Best Regards
Chuansheng
More information about the dri-devel
mailing list