[PATCH] fb: Fix memcpy abuse

Sat May 21 05:03:36 PDT 2011

On Sat, Apr 30, 2011 at 3:39 PM, Soeren Sandmann <sandmann at cs.au.dk> wrote:
> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
>
>>>  1            2                 3                 4           Operation
>>> ------   ---------------   ---------------   ---------------   ------------
>>> 258000   269000 (  1.04)   544000 (  2.11)   552000 (  2.14)   Copy 10x10
>>>  21300    23000 (  1.08)    43700 (  2.05)    47100 (  2.21)   Copy 100x100
>>>   960      962 (  1.00)     1990 (  2.09)     1990 (  2.07)   Copy 500x500
>>>
>>> So it's a modest performance hit, but correctness demands it, and it's
>>> probably worth keeping the 2x speedup from having the fast path in the
>>> first place.
>>
>> In my opinion, still the best solution is to do this stuff in pixman,
>> because it has its own SIMD optimized code and can do a lot better job
>> than memcpy/memmove (for example, it can be improved to use MOVNTx
>> instructions for x86 when scrolling large areas exceeding L2/L3 cache
>> size, and this optimization can't be easily done by glibc if
>> memcpy/memmove is used separately for each individual scanline). I did
>> some experiments with improving scrolling performance when using
>> non-hardware accelerated framebuffer earlier and it showed a really
>> major speedup on ARM:
>> http://lists.x.org/archives/xorg-devel/2009-November/003536.html
>
> If I remember correctly, the main objection I had to the overlapped blt
> patch was that it inlined the C fallback into all the SIMD versions
> instead of just calling down through the delegate.

That's quite easy to fix. Too bad that this problem was brought up on
xorg-devel mailing list a few weeks too late for overlapped pixman_blt
to be introduced in pixman 0.22.0. My main concern earlier was about
how to use the newly added pixman features in xserver and provide a
smooth upgrade path without breaking anything. But now I understand
that it just requires adding this feature to pixman, then wait till
the next stable pixman version 0.24.0 goes out, then add the needed
changes to xserver to use this feature and bump the required pixman
version to 0.24.0 at the same time. And finally the users will be able
to enjoy faster non-hardware accelerated scrolling after the next
stable xserver version gets released and adopted by linux distros.
Right?

> Other than that, I agree that doing this in pixman would be best. In
> fact, I think it would make sense to have a full CopyArea implementation
> in pixman that would also handle rasterops and planemasks etc, but
> simply supporting overlapping in pixman_blt() is useful too.

Sure. Just these other things are much less commonly used and do not
require any urgent attention. It's more like a code
refactoring/cleanup work but not something that is clearly beneficial
for the end users.

-- 
Best regards,
Siarhei Siamashka