[Mesa-dev] [PATCH 2/2] mesa: Speedup the xrgb -> argb special case in fast_read_rgba_pixels_memcpy

Mon Mar 11 12:00:57 PDT 2013

Ian Romanick <idr at freedesktop.org> writes:

> On 03/11/2013 07:56 AM, Jose Fonseca wrote:
>> I'm surprised this is is faster.
>>
>> In particular, for big things we'll be touching memory twice.
>>
>> Did you measure the speed up?
>
> The second hit is cache-hot, so it may not be too expensive.  I suspect 
> memcpy is optimized to fill the cache in a more efficient manner than 
> the old loop.  Since the old loop did a read and a bit-wise or, it's 
> also possible the compiler generated some really dumb code.  We'd have 
> to look at the assembly output to know.

This is readpixels.  You are probably reading from uncached memory
(assuming the driver didn't do something clever), so you want the
biggest possible word read at a time (memcpy, not 32-bits in a loop), or
if you're on a core2 or better CPU, you want to use movntdqa for the
read so you get streaming performance.

If anyone's interested, there's some code in the movntdqa branch of my
tree (for the ugly old span code and pre-automake), and the movnt branch
of my tree (that does automake integration and is much prettier, but
movntdqa is the instruction you want)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130311/227dd5fe/attachment.pgp>