[Mesa-dev] [PATCH 4/4] i965: Implemente a tiled fast-path for glReadPixels and glGetTexImage

Jason Ekstrand jason at jlekstrand.net
Tue Jan 20 12:09:37 PST 2015


On Tue, Jan 13, 2015 at 10:44 AM, Chad Versace <chad.versace at intel.com>
wrote:

> On 01/12/2015 10:22 AM, Jason Ekstrand wrote:
> > From: Sisinty Sasmita Patra <sisinty.patra at intel.com>
> >
> > Added intel_readpixels_tiled_mempcpy and
> intel_gettexsubimage_tiled_mempcpy
> > functions. These are the fast paths for glReadPixels and glGetTexImage.
> >
> > On chrome, using the RoboHornet 2D Canvas toDataURL test, this patch cuts
> > amount of time spent in glReadPixels by more than half and reduces the
> time
> > of the entire test by 10%.
> >
> > v2: Jason Ekstrand <jason.ekstrand at intel.com>
> >    - Refactor to make the functions look more like the old
> >      intel_tex_subimage_tiled_memcpy
> >    - Don't export the readpixels_tiled_memcpy function
> >    - Fix some pointer arithmatic bugs in partial image downloads (using
> >      ReadPixels with a non-zero x or y offset)
> >    - Fix a bug when ReadPixels is performed on an FBO wrapping a texture
> >      miplevel other than zero.
> >
> > v3: Jason Ekstrand <jason.ekstrand at intel.com>
> >    - Better documentation fot the *_tiled_memcpy functions
> >    - Add target restrictions for renderbuffers wrapping textures
> >
> > v4: Jason Ekstrand <jason.ekstrand at intel.com>
> >    - Only check the return value of brw_bo_map for error and not
> bo->virtual
> >
> > Signed-off-by: Jason Ekstrand <jason.ekstrand at intel.com>
> > ---
> >  src/mesa/drivers/dri/i965/intel_pixel_read.c | 142
> ++++++++++++++++++++++++++-
> >  src/mesa/drivers/dri/i965/intel_tex.h        |   9 ++
> >  src/mesa/drivers/dri/i965/intel_tex_image.c  | 139
> +++++++++++++++++++++++++-
> >  3 files changed, 287 insertions(+), 3 deletions(-)
>
>
> > +/**
> > + * \brief A fast path for glReadPixels
> > + *
> > + * This fast path is taken when the source format is BGRA, RGBA,
> > + * A or L and when the texture memory is X- or Y-tiled.  It downloads
> > + * the source data by directly mapping the memory without a GTT fence.
> > + * This then needs to be de-tiled on the CPU before presenting the data
> to
> > + * the user in the linear fasion.
> > + *
> > + * This is a performance win over the conventional texture download
> path.
> > + * In the conventional texture download path, the texture is either
> mapped
> > + * through the GTT or copied to a linear buffer with the blitter before
> > + * handing off to a software path.  This allows us to avoid
> round-tripping
> > + * through the GPU (in the case where we would be blitting) and do only
> a
> > + * single copy operation.
> > + */
> > +static bool
> > +intel_readpixels_tiled_memcpy(struct gl_context * ctx,
> > +                              GLint xoffset, GLint yoffset,
> > +                              GLsizei width, GLsizei height,
> > +                              GLenum format, GLenum type,
> > +                              GLvoid * pixels,
> > +                              const struct gl_pixelstore_attrib *pack)
> > +{
>
> [snip]
>
> > +
> > +   if (!intel_get_memcpy(rb->Format, format, type, &mem_copy, &cpp))
> > +      return false;
>
> The use of intel_get_memcpy here is surprising and relies on a lucky
> coincidence, if I understand this function correctly (which maybe I don't).
> Whether the driver is copying data *to* of *from* the miptree,
> intel_get_memcpy is given the same parameter values. In other words, the
> 'tiledFormat' and 'format' parameters of intel_get_memcpy are symmetric.
> Please add a comment somewhere (maybe in intel_get_memcpy's Doxygen, maybe
> somewhere else is better) that documents that symmetry.
>

Sure, I can do that.  Yes, the symmetry is a bit subtle and relies on the
fact that the only two copy functions that are returned are memcpy and one
for bgra both of which are symmetric.  Well, more to the point, they are
their own inverse.  I'll add a comment to intel_get_memcpy to that effect.
In particular, I'll add the following comment to get_memcpy:

 * The only two possible copy functions which are ever returned are a
 * direct memcpy and a RGBA <-> BGRA copy function.  Since RGBA -> BGRA and
 * BGRA -> RGBA are exactly the same operation (and memcpy is obviously
 * symmetric), it doesn't matter whether the copy is from the tiled image
 * to the untiled or vice versa.  The copy function required is the same in
 * either case so this function can be used.

If you don't mind, I'll squash that in to the commit where we add the
tiled_to_linear paths.


> > +/**
> > + * \brief A fast path for glGetTexImage.
> > + *
> > + * This fast path is taken when the source format is BGRA, RGBA,
> > + * A or L and when the texture memory is X- or Y-tiled.  It downloads
> > + * the source data by directly mapping the memory without a GTT fence.
> > + * This then needs to be de-tiled on the CPU before presenting the data
> to
> > + * the user in the linear fasion.
> > + *
> > + * This is a performance win over the conventional texture download
> path.
> > + * In the conventional texture download path, the texture is either
> mapped
> > + * through the GTT or copied to a linear buffer with the blitter before
> > + * handing off to a software path.  This allows us to avoid
> round-tripping
> > + * through the GPU (in the case where we would be blitting) and do only
> a
> > + * single copy operation.
> > + */
>
> Rather than duplicating such a length comment block, I think it's better
> to refer back to the original block with \see
> intel_readpixels_tiled_memcpy().
>

Sure.


>
> > +bool
> > +intel_gettexsubimage_tiled_memcpy(struct gl_context *ctx,
> > +                                  struct gl_texture_image *texImage,
> > +                                  GLint xoffset, GLint yoffset,
> > +                                  GLsizei width, GLsizei height,
> > +                                  GLenum format, GLenum type,
> > +                                  GLvoid *pixels,
> > +                                  const struct gl_pixelstore_attrib
> *packing)
>
> The signatures and bodies of intel_gettexsubimage_tiled_memcpy and
> intel_readpixels_tiled_memcpy are nearly identical, yet the function
> bodies are not trivial. The two functions should really share their
> common code. As far as I can tell, the only differences between the
> two function bodies are:
>   - intel_readpixels_tiled_memcpy pulls the miptree out of
> ctx->_ColorReadBuffer
>   - intel_gettexsubimage_tiled_memcpy pulls the miptree out of a texImage
>   - intel_gettexsubimage_tiled_memcpy does a few sanity checks on that
> texImage
>

I looked over those again.  Yes, they are similar and we could share some
code but it's kind of tricky.  We don't want to have too many "check some
stuff" helper functions.  One option would be to have another function with
the tiled_memcpy functions that does a takes a miptree and handles checking
the tiling, doing the resolve, mapping the buffer etc.  However, I can't
think of a good way to do the other packing/format checks in a shared
function without running into "shared code for the sake of shared code".
--Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150120/9fe4fd12/attachment.html>


More information about the mesa-dev mailing list