[PATCH v2] Try and get overlapping cases fixed.

Mon May 16 09:26:52 PDT 2011

Is the one div needed for:

bpp / 8
bpp % 8

really universally faster than the two bitwise ops needed for

bpp >> 3
bpp & 0x7

?  I'm sure most modern compilers will know how to optimize that based on the target CPU, but I've always tried to avoid doing mults and divs in fast paths where possible.

--Jeremy

On May 16, 2011, at 09:08, Cyril Brulebois wrote:

> From: Adam Jackson <ajax at redhat.com>
> 
> The memcpy fast path implicitly assumes that the copy walks
> left-to-right.  That's not something memcpy guarantees, and newer glibc
> on some processors will indeed break that assumption.  Since we walk a
> line at a time, check the source and destination against the width of
> the blit to determine whether we can be sloppy enough to allow memcpy.
> (Having done this, we can remove the check for !reverse as well.)
> 
> On an Intel Core i7-2630QM with an NVIDIA GeForce GTX 460M running in
> NoAccel, the broken code and various fixes for -copywinwin{10,100,500}
> gives (edited to fit in 80 columns):
> 
> 1: Disable the fastpath entirely
> 2: Replace memcpy with memmove
> 3: This fix
> 4: The code before this fix
> 
>  1            2                 3                 4           Operation
> ------   ---------------   ---------------   ---------------   ------------
> 258000   269000 (  1.04)   544000 (  2.11)   552000 (  2.14)   Copy 10x10
> 21300    23000 (  1.08)    43700 (  2.05)    47100 (  2.21)   Copy 100x100
>   960      962 (  1.00)     1990 (  2.09)     1990 (  2.07)   Copy 500x500
> 
> So it's a modest performance hit, but correctness demands it, and it's
> probably worth keeping the 2x speedup from having the fast path in the
> first place.
> 
> Signed-off-by: Adam Jackson <ajax at redhat.com>
> 
> v2: Fix limit cases thanks to Soeren Sandmann, and apply a tiny
>    optimization by Walter Harms.
> 
> Signed-off-by: Cyril Brulebois <kibi at debian.org>
> ---
> fb/fbblt.c |    5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
> 
> 
> Tested on amd64 on top of xorg-server's server-1.10-branch.
> 
> 
> diff --git a/fb/fbblt.c b/fb/fbblt.c
> index 38271c0..b6e7785 100644
> --- a/fb/fbblt.c
> +++ b/fb/fbblt.c
> @@ -65,6 +65,7 @@ fbBlt (FbBits   *srcLine,
>     int	    n, nmiddle;
>     Bool    destInvarient;
>     int	    startbyte, endbyte;
> +    int     careful;
>     FbDeclareMergeRop ();
> 
> #ifdef FB_24BIT
> @@ -76,7 +77,9 @@ fbBlt (FbBits   *srcLine,
>     }
> #endif
> 
> -    if (alu == GXcopy && pm == FB_ALLONES && !reverse &&
> +    careful = (width * (bpp / 8) > abs(srcLine-dstLine)) || (bpp % 8);
> +
> +    if (alu == GXcopy && pm == FB_ALLONES && !careful &&
>             !(srcX & 7) && !(dstX & 7) && !(width & 7)) {
>         int i;
>         CARD8 *src = (CARD8 *) srcLine;
> -- 
> 1.7.5.1
> 
> _______________________________________________
> xorg-devel at lists.x.org: X.Org development
> Archives: http://lists.x.org/archives/xorg-devel
> Info: http://lists.x.org/mailman/listinfo/xorg-devel
>