fbdev: Garbage collect fbdev scrolling acceleration
Sven Schnelle
svens at stackframe.org
Wed Jan 19 16:33:53 UTC 2022
Hi Daniel,
Daniel Vetter <daniel at ffwll.ch> writes:
> On Wed, Jan 19, 2022 at 05:15:44PM +0100, Sven Schnelle wrote:
>> Hi Daniel,
>>
>> Daniel Vetter <daniel at ffwll.ch> writes:
>>
>> > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote:
>> >> Helge Deller <deller at gmx.de> writes:
>> >> > Maybe on fast new x86 boxes the performance difference isn't huge,
>> >> > but for all old systems, or when emulated in qemu, this makes
>> >> > a big difference.
>> >> >
>> >> > Helge
>> >>
>> >> I second that. For most people, the framebuffer isn't important as
>> >> they're mostly interested in getting to X11/wayland as fast as possible.
>> >> But for systems like servers without X11 it's nice to have a fast
>> >> console.
>> >
>> > Fast console howto:
>> > - shadow buffer in cached memory
>> > - timer based upload of changed areas to the real framebuffer
>> >
>> > This one is actually fast, instead of trying to use hw bltcopy and having
>> > the most terrible fallback path if that's gone. Yes drm fbdev helpers has
>> > this (but not enabled on most drivers because very, very few people care).
>>
>> Hmm.... Take my Laptop with a 4k (3180x2160) screen as an example:
>>
>> Lets say on average the half of every line is filled with text.
>>
>> So 3840/2*2160 pixels that change = 4147200 pixels. Every pixel takes 4
>> bytes = 16,588800 bytes per timer interrupt. In another Mail updating on
>> vsync was mentioned, so multiply that by 60 and get ~927MB. And even if
>> you only update the screen ony 4 times per second, that would be ~64MB
>> of data. I'm likely missing something here.
>
> Since you say 4k it's a modern box, so you have on the order of 10GB/s of
> write bandwidth.
>
> And around 100MB/s of read bandwidth. Both from the cpu. It all adds up.
> It's that uncached read which kills you and means dmesg takes seconds to
> display.
>
> Also since this is 4k looking at sales volume we're talking integrated, so
> whether it's the gpu or the cpu that's doing the memcpy, it's the same
> memory bw budget you're burning down.
That might be true for integrated graphics, as said, i don't know the
architecture. But saying it's good just because it's good on one
architecture doesn't mean it's good for everyone. If you have an
external GPU, than the memory/system bus BW would be different whether
it's memcpy or the GPU doing the scrolling. And whether internal or external
graphics - the CPU could do other stuff while the GPU scrolls stuff.
Quite a lot of discussion for a revert of a patch that was already in
the kernel for more than 20(?) years.
/Sven
More information about the dri-devel
mailing list