[Mesa-dev] [PATCH 2/2] mesa: Speedup the xrgb -> argb special case in fast_read_rgba_pixels_memcpy

Tue Mar 12 09:37:54 PDT 2013

Am 12.03.2013 02:23, schrieb Marek Olšák:
> On Mon, Mar 11, 2013 at 6:49 PM, Ian Romanick <idr at freedesktop.org> wrote:
>> Once upon a time Matt Turner was talking about using pixman to accelerate
>> operations like this in Mesa.  It has a lot of highly optimized paths for
>> just this sort of thing.  Since it's used by other projects, it gets a lot
>> more testing, etc.  It may be worth looking at using that to solve this
>> problem.
> 
> I think that using pixman or any other CPU-based solution is a waste
> of time (for dedicated GPUs at least). The OpenGL packing and
> unpacking can be implemented entirely on the GPU using streamout and
> TBOs, and we generally only want memcpy on the CPU side. That would
> also allow us to finally accelerate pixel buffer objects.
> 
> For now, the easiest and fastest solution is to do a blit, which
> should cover swizzling and format conversions. We just need a lot of
> texture formats or do swizzling in the fragment shader. The
> destination of the blit can be a temporary texture allocated in GTT.
> The author of the patch (at least I think it's him) has actually
> started working on the blit-based solution for ReadPixels in st/mesa
> and the time spent in ReadPixels went from 2300 ms to 9 ms (so he can
> still use additional 7.6 ms for rendering and be at 60 fps).

For format conversions you're probably right, however if you only need
easy swizzling (like rgba->abgr or some such) the cpu solution should be
faster - because last time I checked swizzling is essentially an
operation which can be done for free (completely hidden by the cost of
read/write memory, you can easily do such swizzles for 4 rgba8 pixels at
once with a single cpu instruction, at least if you've got ssse3 -
though it would require memcpy-like optimizations for fetching and
storing the values, essentially you need a memcpy implementation with
inlined swizzling). That is unless you'd need some blit on the gpu side
in any case, in which case the conversion there is also free.
I agree though it is probably easier to do a blit, which covers more
cases (both because you can also do format conversion and because you
don't need separate code path depending on cpu).

Roland