[Mesa-dev] [PATCH 7/7] st/mesa: cache staging texture for glReadPixels

Nicolai Hähnle nhaehnle at gmail.com
Tue Jun 21 08:53:18 UTC 2016


Hi Ilia,

I'm going to push this with your ST_DEBUG suggestion. I did some more 
micro-benchmarking with the attached test, which confirmed that the 
current heuristic is decent for radeonsi. One might get slightly better 
results with a mixed heuristic that includes the number of readpixels 
calls, but at some point it just becomes bike-shedding.

If you find that it's not a good match for nouveau in the transfer_map 
case, perhaps we can follow up with a pipe cap.

Cheers,
Nicolai

On 15.06.2016 17:20, Ilia Mirkin wrote:
> On Wed, Jun 15, 2016 at 4:38 AM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>> From: Nicolai Hähnle <nicolai.haehnle at amd.com>
>>
>> ---
>>   src/mesa/state_tracker/st_cb_fbo.h        |   2 +
>>   src/mesa/state_tracker/st_cb_readpixels.c | 122 ++++++++++++++++++++++++++----
>>   2 files changed, 110 insertions(+), 14 deletions(-)
>>
>> diff --git a/src/mesa/state_tracker/st_cb_fbo.h b/src/mesa/state_tracker/st_cb_fbo.h
>> index f3b310b..414a661 100644
>> --- a/src/mesa/state_tracker/st_cb_fbo.h
>> +++ b/src/mesa/state_tracker/st_cb_fbo.h
>> @@ -58,6 +58,8 @@ struct st_renderbuffer
>>      boolean software;
>>      void *data;
>>
>> +   bool use_readpix_cache;
>> +
>>      /* Inputs from Driver.RenderTexture, don't use directly. */
>>      boolean is_rtt; /**< whether Driver.RenderTexture was called */
>>      unsigned rtt_face, rtt_slice;
>> diff --git a/src/mesa/state_tracker/st_cb_readpixels.c b/src/mesa/state_tracker/st_cb_readpixels.c
>> index a501b7b..42d147f 100644
>> --- a/src/mesa/state_tracker/st_cb_readpixels.c
>> +++ b/src/mesa/state_tracker/st_cb_readpixels.c
>> @@ -46,6 +46,28 @@
>>   #include "state_tracker/st_pbo.h"
>>   #include "state_tracker/st_texture.h"
>>
>> +/* The readpixels cache caches a blitted staging texture so that back-to-back
>> + * calls to glReadPixels with user pointers require less CPU-GPU synchronization.
>> + *
>> + * Assumptions:
>> + *
>> + * (1) Blits have high synchronization overheads, and it is beneficial to
>> + *     use a single blit of the entire framebuffer instead of many smaller
>> + *     blits (because the smaller blits cannot be batched, and we have to wait
>> + *     for the GPU after each one).
>> + *
>> + * (2) transfer_map implicitly involves a blit as well (for de-tiling, copy
>> + *     from VRAM, etc.), so that it is beneficial to replace the
>> + *     _mesa_readpixels path as well when possible.
>
> FWIW this second point isn't the case for nouveau. We map the texture
> directly under many conditions. However that's still slow since VRAM
> is "far away", and, without having done the tests, this seems like it
> would improve the situation for nouveau as well.
>
> BTW, what are some good tests to play with for testing this out?
>
>> + *
>> + */
>> +#define USE_READPIXELS_CACHE true
>
> Maybe add it to ST_DEBUG instead (so it can be disabled)? Your call.
>
>    -ilia
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: readpixels-bench.cpp
Type: text/x-c++src
Size: 2277 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160621/ca58ad71/attachment.cpp>


More information about the mesa-dev mailing list