sse2: Add a fast path for OVER 8888 x 8 x 8888
Matt Turner
mattst88 at gmail.com
Wed Nov 11 13:03:08 PST 2009
On Tue, Nov 10, 2009 at 6:39 PM, Soeren Sandmann <sandmann at daimi.au.dk> wrote:
> Hi,
>
> Here:
>
> http://cgit.freedesktop.org/~sandmann/pixman/commit/?h=sse_8888_8_8888
>
> is a patch that adds an sse2 8888 x 8 x 8888 fast path. This is a
> small speedup on the swfdec-youtube benchmark:
>
> Before:
> [ 0] image swfdec-youtube 5.789 5.806 0.20% 6/6
>
> After:
> [ 0] image swfdec-youtube 5.489 5.524 0.27% 6/6
>
> Ie., approximately 5% faster.
>
> Please check that I didn't miss anything.
I asked on the flatassember.net forums for a review of this code. See
http://board.flatassembler.net/topic.php?t=10839#104485
As mentioned in the thread, what kind of performance difference do you
have if you move the cache prefetch outside of the main loop and
remove it elsewhere?
Please check out the other suggestions.
Thanks,
Matt
More information about the xorg-devel
mailing list