[PATCH v2] Try and get overlapping cases fixed.
Jeremy Huddleston
jeremyhu at apple.com
Mon May 16 09:26:52 PDT 2011
Is the one div needed for:
bpp / 8
bpp % 8
really universally faster than the two bitwise ops needed for
bpp >> 3
bpp & 0x7
? I'm sure most modern compilers will know how to optimize that based on the target CPU, but I've always tried to avoid doing mults and divs in fast paths where possible.
--Jeremy
On May 16, 2011, at 09:08, Cyril Brulebois wrote:
> From: Adam Jackson <ajax at redhat.com>
>
> The memcpy fast path implicitly assumes that the copy walks
> left-to-right. That's not something memcpy guarantees, and newer glibc
> on some processors will indeed break that assumption. Since we walk a
> line at a time, check the source and destination against the width of
> the blit to determine whether we can be sloppy enough to allow memcpy.
> (Having done this, we can remove the check for !reverse as well.)
>
> On an Intel Core i7-2630QM with an NVIDIA GeForce GTX 460M running in
> NoAccel, the broken code and various fixes for -copywinwin{10,100,500}
> gives (edited to fit in 80 columns):
>
> 1: Disable the fastpath entirely
> 2: Replace memcpy with memmove
> 3: This fix
> 4: The code before this fix
>
> 1 2 3 4 Operation
> ------ --------------- --------------- --------------- ------------
> 258000 269000 ( 1.04) 544000 ( 2.11) 552000 ( 2.14) Copy 10x10
> 21300 23000 ( 1.08) 43700 ( 2.05) 47100 ( 2.21) Copy 100x100
> 960 962 ( 1.00) 1990 ( 2.09) 1990 ( 2.07) Copy 500x500
>
> So it's a modest performance hit, but correctness demands it, and it's
> probably worth keeping the 2x speedup from having the fast path in the
> first place.
>
> Signed-off-by: Adam Jackson <ajax at redhat.com>
>
> v2: Fix limit cases thanks to Soeren Sandmann, and apply a tiny
> optimization by Walter Harms.
>
> Signed-off-by: Cyril Brulebois <kibi at debian.org>
> ---
> fb/fbblt.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
>
> Tested on amd64 on top of xorg-server's server-1.10-branch.
>
>
> diff --git a/fb/fbblt.c b/fb/fbblt.c
> index 38271c0..b6e7785 100644
> --- a/fb/fbblt.c
> +++ b/fb/fbblt.c
> @@ -65,6 +65,7 @@ fbBlt (FbBits *srcLine,
> int n, nmiddle;
> Bool destInvarient;
> int startbyte, endbyte;
> + int careful;
> FbDeclareMergeRop ();
>
> #ifdef FB_24BIT
> @@ -76,7 +77,9 @@ fbBlt (FbBits *srcLine,
> }
> #endif
>
> - if (alu == GXcopy && pm == FB_ALLONES && !reverse &&
> + careful = (width * (bpp / 8) > abs(srcLine-dstLine)) || (bpp % 8);
> +
> + if (alu == GXcopy && pm == FB_ALLONES && !careful &&
> !(srcX & 7) && !(dstX & 7) && !(width & 7)) {
> int i;
> CARD8 *src = (CARD8 *) srcLine;
> --
> 1.7.5.1
>
> _______________________________________________
> xorg-devel at lists.x.org: X.Org development
> Archives: http://lists.x.org/archives/xorg-devel
> Info: http://lists.x.org/mailman/listinfo/xorg-devel
>
More information about the xorg-devel
mailing list