[Mesa-dev] [PATCH 7/7] st/mesa: cache staging texture for glReadPixels
Nicolai Hähnle
nhaehnle at gmail.com
Tue Jun 21 08:53:18 UTC 2016
Hi Ilia,
I'm going to push this with your ST_DEBUG suggestion. I did some more
micro-benchmarking with the attached test, which confirmed that the
current heuristic is decent for radeonsi. One might get slightly better
results with a mixed heuristic that includes the number of readpixels
calls, but at some point it just becomes bike-shedding.
If you find that it's not a good match for nouveau in the transfer_map
case, perhaps we can follow up with a pipe cap.
Cheers,
Nicolai
On 15.06.2016 17:20, Ilia Mirkin wrote:
> On Wed, Jun 15, 2016 at 4:38 AM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>> From: Nicolai Hähnle <nicolai.haehnle at amd.com>
>>
>> ---
>> src/mesa/state_tracker/st_cb_fbo.h | 2 +
>> src/mesa/state_tracker/st_cb_readpixels.c | 122 ++++++++++++++++++++++++++----
>> 2 files changed, 110 insertions(+), 14 deletions(-)
>>
>> diff --git a/src/mesa/state_tracker/st_cb_fbo.h b/src/mesa/state_tracker/st_cb_fbo.h
>> index f3b310b..414a661 100644
>> --- a/src/mesa/state_tracker/st_cb_fbo.h
>> +++ b/src/mesa/state_tracker/st_cb_fbo.h
>> @@ -58,6 +58,8 @@ struct st_renderbuffer
>> boolean software;
>> void *data;
>>
>> + bool use_readpix_cache;
>> +
>> /* Inputs from Driver.RenderTexture, don't use directly. */
>> boolean is_rtt; /**< whether Driver.RenderTexture was called */
>> unsigned rtt_face, rtt_slice;
>> diff --git a/src/mesa/state_tracker/st_cb_readpixels.c b/src/mesa/state_tracker/st_cb_readpixels.c
>> index a501b7b..42d147f 100644
>> --- a/src/mesa/state_tracker/st_cb_readpixels.c
>> +++ b/src/mesa/state_tracker/st_cb_readpixels.c
>> @@ -46,6 +46,28 @@
>> #include "state_tracker/st_pbo.h"
>> #include "state_tracker/st_texture.h"
>>
>> +/* The readpixels cache caches a blitted staging texture so that back-to-back
>> + * calls to glReadPixels with user pointers require less CPU-GPU synchronization.
>> + *
>> + * Assumptions:
>> + *
>> + * (1) Blits have high synchronization overheads, and it is beneficial to
>> + * use a single blit of the entire framebuffer instead of many smaller
>> + * blits (because the smaller blits cannot be batched, and we have to wait
>> + * for the GPU after each one).
>> + *
>> + * (2) transfer_map implicitly involves a blit as well (for de-tiling, copy
>> + * from VRAM, etc.), so that it is beneficial to replace the
>> + * _mesa_readpixels path as well when possible.
>
> FWIW this second point isn't the case for nouveau. We map the texture
> directly under many conditions. However that's still slow since VRAM
> is "far away", and, without having done the tests, this seems like it
> would improve the situation for nouveau as well.
>
> BTW, what are some good tests to play with for testing this out?
>
>> + *
>> + */
>> +#define USE_READPIXELS_CACHE true
>
> Maybe add it to ST_DEBUG instead (so it can be disabled)? Your call.
>
> -ilia
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: readpixels-bench.cpp
Type: text/x-c++src
Size: 2277 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160621/ca58ad71/attachment.cpp>
More information about the mesa-dev
mailing list