[PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
Andy Shevchenko
andriy.shevchenko at linux.intel.com
Thu Feb 10 16:15:10 UTC 2022
On Thu, Feb 10, 2022 at 03:11:13PM +0100, Thomas Zimmermann wrote:
> Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
> significant overhead. Make the sorting step optional and update the few
> drivers that require it. Use a FIFO list by default.
>
> Sorting pages by memory offset for deferred I/O performs an implicit
> bubble-sort step on the list of dirty pages. The algorithm goes through
> the list of dirty pages and inserts each new page according to its
> index field. Even worse, list traversal always starts at the first
> entry. As video memory is most likely updated scanline by scanline, the
> algorithm traverses through the complete list for each updated page.
>
> For example, with 1024x768x32bpp a page covers exactly one scanline.
> Writing a single screen update from top to bottom requires updating
> 768 pages. With an average list length of 384 entries, a screen update
> creates (768 * 384 =) 294912 compare operation.
>
> Fix this by making the sorting step opt-in and update the few drivers
> that require it. All other drivers work with unsorted page lists. Pages
> are appended to the list. Therefore, in the common case of writing the
> framebuffer top to bottom, pages are still sorted by offset, which may
> have a positive effect on performance.
>
> Playing a video [1] in mplayer's benchmark mode shows the difference
> (i7-4790, FullHD, simpledrm, kernel with debugging).
>
> mplayer -benchmark -nosound -vo fbdev ./big_buck_bunny_720p_stereo.ogg
>
> With sorted page lists:
>
> BENCHMARKs: VC: 32.960s VO: 73.068s A: 0.000s Sys: 2.413s = 108.441s
> BENCHMARK%: VC: 30.3947% VO: 67.3802% A: 0.0000% Sys: 2.2251% = 100.0000%
>
> With unsorted page lists:
>
> BENCHMARKs: VC: 31.005s VO: 42.889s A: 0.000s Sys: 2.256s = 76.150s
> BENCHMARK%: VC: 40.7156% VO: 56.3219% A: 0.0000% Sys: 2.9625% = 100.0000%
>
> VC shows the overhead of video decoding, VO shows the overhead of the
> video output. Using unsorted page lists reduces the benchmark's run time
> by ~32s/~25%.
Acked-by: Andy Shevchenko <andriy.shevchenko at linux.intel.com>
from fbtft perspective, thanks!
> Signed-off-by: Thomas Zimmermann <tzimmermann at suse.de>
> Link: https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_stereo.ogg # [1]
> ---
> drivers/staging/fbtft/fbtft-core.c | 1 +
> drivers/video/fbdev/broadsheetfb.c | 1 +
> drivers/video/fbdev/core/fb_defio.c | 19 ++++++++++++-------
> drivers/video/fbdev/metronomefb.c | 1 +
> drivers/video/fbdev/udlfb.c | 1 +
> include/linux/fb.h | 1 +
> 6 files changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/fbtft/fbtft-core.c
> index f2684d2d6851..4a35347b3020 100644
> --- a/drivers/staging/fbtft/fbtft-core.c
> +++ b/drivers/staging/fbtft/fbtft-core.c
> @@ -654,6 +654,7 @@ struct fb_info *fbtft_framebuffer_alloc(struct fbtft_display *display,
> fbops->fb_blank = fbtft_fb_blank;
>
> fbdefio->delay = HZ / fps;
> + fbdefio->sort_pagelist = true;
> fbdefio->deferred_io = fbtft_deferred_io;
> fb_deferred_io_init(info);
>
> diff --git a/drivers/video/fbdev/broadsheetfb.c b/drivers/video/fbdev/broadsheetfb.c
> index fd66f4d4a621..b9054f658838 100644
> --- a/drivers/video/fbdev/broadsheetfb.c
> +++ b/drivers/video/fbdev/broadsheetfb.c
> @@ -1059,6 +1059,7 @@ static const struct fb_ops broadsheetfb_ops = {
>
> static struct fb_deferred_io broadsheetfb_defio = {
> .delay = HZ/4,
> + .sort_pagelist = true,
> .deferred_io = broadsheetfb_dpy_deferred_io,
> };
>
> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
> index 3727b1ca87b1..1f672cf253b2 100644
> --- a/drivers/video/fbdev/core/fb_defio.c
> +++ b/drivers/video/fbdev/core/fb_defio.c
> @@ -132,15 +132,20 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
> if (!list_empty(&page->lru))
> goto page_already_added;
>
> - /* we loop through the pagelist before adding in order
> - to keep the pagelist sorted */
> - list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> - if (cur->index > page->index)
> - break;
> + if (fbdefio->sort_pagelist) {
> + /*
> + * We loop through the pagelist before adding in order
> + * to keep the pagelist sorted.
> + */
> + list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> + if (cur->index > page->index)
> + break;
> + }
> + list_add_tail(&page->lru, &cur->lru);
> + } else {
> + list_add_tail(&page->lru, &fbdefio->pagelist);
> }
>
> - list_add_tail(&page->lru, &cur->lru);
> -
> page_already_added:
> mutex_unlock(&fbdefio->lock);
>
> diff --git a/drivers/video/fbdev/metronomefb.c b/drivers/video/fbdev/metronomefb.c
> index 952826557a0c..af858dd23ea6 100644
> --- a/drivers/video/fbdev/metronomefb.c
> +++ b/drivers/video/fbdev/metronomefb.c
> @@ -568,6 +568,7 @@ static const struct fb_ops metronomefb_ops = {
>
> static struct fb_deferred_io metronomefb_defio = {
> .delay = HZ,
> + .sort_pagelist = true,
> .deferred_io = metronomefb_dpy_deferred_io,
> };
>
> diff --git a/drivers/video/fbdev/udlfb.c b/drivers/video/fbdev/udlfb.c
> index b9cdd02c1000..184bb8433b78 100644
> --- a/drivers/video/fbdev/udlfb.c
> +++ b/drivers/video/fbdev/udlfb.c
> @@ -980,6 +980,7 @@ static int dlfb_ops_open(struct fb_info *info, int user)
>
> if (fbdefio) {
> fbdefio->delay = DL_DEFIO_WRITE_DELAY;
> + fbdefio->sort_pagelist = true;
> fbdefio->deferred_io = dlfb_dpy_deferred_io;
> }
>
> diff --git a/include/linux/fb.h b/include/linux/fb.h
> index 3d7306c9a706..9a77ab615c36 100644
> --- a/include/linux/fb.h
> +++ b/include/linux/fb.h
> @@ -204,6 +204,7 @@ struct fb_pixmap {
> struct fb_deferred_io {
> /* delay between mkwrite and deferred handler */
> unsigned long delay;
> + bool sort_pagelist; /* sort pagelist by offset */
> struct mutex lock; /* mutex that protects the page list */
> struct list_head pagelist; /* list of touched pages */
> /* callback */
> --
> 2.34.1
>
--
With Best Regards,
Andy Shevchenko
More information about the dri-devel
mailing list