[Mesa-dev] [PATCH 4/4] i965: Implemente a tiled fast-path for glReadPixels and glGetTexImage

Chad Versace chad.versace at intel.com
Mon Jan 26 08:41:44 PST 2015


On 01/20/2015 12:09 PM, Jason Ekstrand wrote:
> 
> 
> On Tue, Jan 13, 2015 at 10:44 AM, Chad Versace
> <chad.versace at intel.com <mailto:chad.versace at intel.com>> wrote:
> 
> On 01/12/2015 10:22 AM, Jason Ekstrand wrote:
>> From: Sisinty Sasmita Patra <sisinty.patra at intel.com
>> <mailto:sisinty.patra at intel.com>>
>> 
>> Added intel_readpixels_tiled_mempcpy and
>> intel_gettexsubimage_tiled_mempcpy functions. These are the fast
>> paths for glReadPixels and glGetTexImage.
>> 
>> On chrome, using the RoboHornet 2D Canvas toDataURL test, this
>> patch cuts amount of time spent in glReadPixels by more than half
>> and reduces the time of the entire test by 10%.
>> 
>> v2: Jason Ekstrand <jason.ekstrand at intel.com
>> <mailto:jason.ekstrand at intel.com>> - Refactor to make the functions
>> look more like the old intel_tex_subimage_tiled_memcpy - Don't
>> export the readpixels_tiled_memcpy function - Fix some pointer
>> arithmatic bugs in partial image downloads (using ReadPixels with a
>> non-zero x or y offset) - Fix a bug when ReadPixels is performed on
>> an FBO wrapping a texture miplevel other than zero.
>> 
>> v3: Jason Ekstrand <jason.ekstrand at intel.com
>> <mailto:jason.ekstrand at intel.com>> - Better documentation fot the
>> *_tiled_memcpy functions - Add target restrictions for
>> renderbuffers wrapping textures
>> 
>> v4: Jason Ekstrand <jason.ekstrand at intel.com
>> <mailto:jason.ekstrand at intel.com>> - Only check the return value of
>> brw_bo_map for error and not bo->virtual
>> 
>> Signed-off-by: Jason Ekstrand <jason.ekstrand at intel.com
>> <mailto:jason.ekstrand at intel.com>> --- 
>> src/mesa/drivers/dri/i965/intel_pixel_read.c | 142
>> ++++++++++++++++++++++++++- src/mesa/drivers/dri/i965/intel_tex.h
>> |   9 ++ src/mesa/drivers/dri/i965/intel_tex_image.c  | 139
>> +++++++++++++++++++++++++- 3 files changed, 287 insertions(+), 3
>> deletions(-)
> 
> 
>> +/** + * \brief A fast path for glReadPixels + * + * This fast path
>> is taken when the source format is BGRA, RGBA, + * A or L and when
>> the texture memory is X- or Y-tiled.  It downloads + * the source
>> data by directly mapping the memory without a GTT fence. + * This
>> then needs to be de-tiled on the CPU before presenting the data to 
>> + * the user in the linear fasion. + * + * This is a performance
>> win over the conventional texture download path. + * In the
>> conventional texture download path, the texture is either mapped +
>> * through the GTT or copied to a linear buffer with the blitter
>> before + * handing off to a software path.  This allows us to avoid
>> round-tripping + * through the GPU (in the case where we would be
>> blitting) and do only a + * single copy operation. + */ +static
>> bool +intel_readpixels_tiled_memcpy(struct gl_context * ctx, +
>> GLint xoffset, GLint yoffset, +
>> GLsizei width, GLsizei height, +
>> GLenum format, GLenum type, +                              GLvoid *
>> pixels, +                              const struct
>> gl_pixelstore_attrib *pack) +{
> 
> [snip]
> 
>> + +   if (!intel_get_memcpy(rb->Format, format, type, &mem_copy,
>> &cpp)) +      return false;
> 
> The use of intel_get_memcpy here is surprising and relies on a lucky 
> coincidence, if I understand this function correctly (which maybe I
> don't). Whether the driver is copying data *to* of *from* the
> miptree, intel_get_memcpy is given the same parameter values. In
> other words, the 'tiledFormat' and 'format' parameters of
> intel_get_memcpy are symmetric. Please add a comment somewhere (maybe
> in intel_get_memcpy's Doxygen, maybe somewhere else is better) that
> documents that symmetry.
> 
> 
> Sure, I can do that.  Yes, the symmetry is a bit subtle and relies on
> the fact that the only two copy functions that are returned are
> memcpy and one for bgra both of which are symmetric.  Well, more to
> the point, they are their own inverse.  I'll add a comment to
> intel_get_memcpy to that effect.  In particular, I'll add the
> following comment to get_memcpy:
> 
> * The only two possible copy functions which are ever returned are a 
> * direct memcpy and a RGBA <-> BGRA copy function.  Since RGBA ->
> BGRA and * BGRA -> RGBA are exactly the same operation (and memcpy is
> obviously * symmetric), it doesn't matter whether the copy is from
> the tiled image * to the untiled or vice versa.  The copy function
> required is the same in * either case so this function can be used.
> 
> If you don't mind, I'll squash that in to the commit where we add the
> tiled_to_linear paths.
> 
> 
>> +/** + * \brief A fast path for glGetTexImage. + * + * This fast
>> path is taken when the source format is BGRA, RGBA, + * A or L and
>> when the texture memory is X- or Y-tiled.  It downloads + * the
>> source data by directly mapping the memory without a GTT fence. + *
>> This then needs to be de-tiled on the CPU before presenting the
>> data to + * the user in the linear fasion. + * + * This is a
>> performance win over the conventional texture download path. + * In
>> the conventional texture download path, the texture is either
>> mapped + * through the GTT or copied to a linear buffer with the
>> blitter before + * handing off to a software path.  This allows us
>> to avoid round-tripping + * through the GPU (in the case where we
>> would be blitting) and do only a + * single copy operation. + */
> 
> Rather than duplicating such a length comment block, I think it's
> better to refer back to the original block with \see
> intel_readpixels_tiled_memcpy().
> 
> 
> Sure.
> 
> 
> 
>> +bool +intel_gettexsubimage_tiled_memcpy(struct gl_context *ctx, +
>> struct gl_texture_image *texImage, +
>> GLint xoffset, GLint yoffset, +
>> GLsizei width, GLsizei height, +
>> GLenum format, GLenum type, +
>> GLvoid *pixels, +                                  const struct
>> gl_pixelstore_attrib *packing)
> 
> The signatures and bodies of intel_gettexsubimage_tiled_memcpy and 
> intel_readpixels_tiled_memcpy are nearly identical, yet the function 
> bodies are not trivial. The two functions should really share their 
> common code. As far as I can tell, the only differences between the 
> two function bodies are: - intel_readpixels_tiled_memcpy pulls the
> miptree out of ctx->_ColorReadBuffer -
> intel_gettexsubimage_tiled_memcpy pulls the miptree out of a
> texImage - intel_gettexsubimage_tiled_memcpy does a few sanity checks
> on that texImage
> 
> 
> I looked over those again.  Yes, they are similar and we could share
> some code but it's kind of tricky.  We don't want to have too many
> "check some stuff" helper functions.  One option would be to have
> another function with the tiled_memcpy functions that does a takes a
> miptree and handles checking the tiling, doing the resolve, mapping
> the buffer etc.  However, I can't think of a good way to do the other
> packing/format checks in a shared function without running into
> "shared code for the sake of shared code".

Then forget the code sharing. Perfection is the enemy of the good.

With the comment change mentioned earlier, this series is
Reviewed-by: Chad Versace <chad.versace at intel.com>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 884 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150126/de6f039b/attachment.sig>


More information about the mesa-dev mailing list