<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jan 13, 2015 at 10:44 AM, Chad Versace <span dir="ltr"><<a href="mailto:chad.versace@intel.com" target="_blank">chad.versace@intel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class="h5">On 01/12/2015 10:22 AM, Jason Ekstrand wrote:<br> > From: Sisinty Sasmita Patra <<a href="mailto:sisinty.patra@intel.com">sisinty.patra@intel.com</a>><br> ><br> > Added intel_readpixels_tiled_mempcpy and intel_gettexsubimage_tiled_mempcpy<br> > functions. These are the fast paths for glReadPixels and glGetTexImage.<br> ><br> > On chrome, using the RoboHornet 2D Canvas toDataURL test, this patch cuts<br> > amount of time spent in glReadPixels by more than half and reduces the time<br> > of the entire test by 10%.<br> ><br> > v2: Jason Ekstrand <<a href="mailto:jason.ekstrand@intel.com">jason.ekstrand@intel.com</a>><br> > - Refactor to make the functions look more like the old<br> > intel_tex_subimage_tiled_memcpy<br> > - Don't export the readpixels_tiled_memcpy function<br> > - Fix some pointer arithmatic bugs in partial image downloads (using<br> > ReadPixels with a non-zero x or y offset)<br> > - Fix a bug when ReadPixels is performed on an FBO wrapping a texture<br> > miplevel other than zero.<br> ><br> > v3: Jason Ekstrand <<a href="mailto:jason.ekstrand@intel.com">jason.ekstrand@intel.com</a>><br> > - Better documentation fot the *_tiled_memcpy functions<br> > - Add target restrictions for renderbuffers wrapping textures<br> ><br> > v4: Jason Ekstrand <<a href="mailto:jason.ekstrand@intel.com">jason.ekstrand@intel.com</a>><br> > - Only check the return value of brw_bo_map for error and not bo->virtual<br> ><br> > Signed-off-by: Jason Ekstrand <<a href="mailto:jason.ekstrand@intel.com">jason.ekstrand@intel.com</a>><br> > ---<br> > src/mesa/drivers/dri/i965/intel_pixel_read.c | 142 ++++++++++++++++++++++++++-<br> > src/mesa/drivers/dri/i965/intel_tex.h | 9 ++<br> > src/mesa/drivers/dri/i965/intel_tex_image.c | 139 +++++++++++++++++++++++++-<br> > 3 files changed, 287 insertions(+), 3 deletions(-)<br> <br> <br> </div></div><span class="">> +/**<br> > + * \brief A fast path for glReadPixels<br> > + *<br> > + * This fast path is taken when the source format is BGRA, RGBA,<br> > + * A or L and when the texture memory is X- or Y-tiled. It downloads<br> > + * the source data by directly mapping the memory without a GTT fence.<br> > + * This then needs to be de-tiled on the CPU before presenting the data to<br> > + * the user in the linear fasion.<br> > + *<br> > + * This is a performance win over the conventional texture download path.<br> > + * In the conventional texture download path, the texture is either mapped<br> > + * through the GTT or copied to a linear buffer with the blitter before<br> > + * handing off to a software path. This allows us to avoid round-tripping<br> > + * through the GPU (in the case where we would be blitting) and do only a<br> > + * single copy operation.<br> > + */<br> > +static bool<br> > +intel_readpixels_tiled_memcpy(struct gl_context * ctx,<br> > + GLint xoffset, GLint yoffset,<br> > + GLsizei width, GLsizei height,<br> > + GLenum format, GLenum type,<br> > + GLvoid * pixels,<br> > + const struct gl_pixelstore_attrib *pack)<br> > +{<br> <br> </span>[snip]<br> <span class=""><br> > +<br> > + if (!intel_get_memcpy(rb->Format, format, type, &mem_copy, &cpp))<br> > + return false;<br> <br> </span>The use of intel_get_memcpy here is surprising and relies on a lucky<br> coincidence, if I understand this function correctly (which maybe I don't).<br> Whether the driver is copying data *to* of *from* the miptree,<br> intel_get_memcpy is given the same parameter values. In other words, the<br> 'tiledFormat' and 'format' parameters of intel_get_memcpy are symmetric.<br> Please add a comment somewhere (maybe in intel_get_memcpy's Doxygen, maybe<br> somewhere else is better) that documents that symmetry.<span class=""><br></span></blockquote><div><br></div><div>Sure, I can do that. Yes, the symmetry is a bit subtle and relies on the fact that the only two copy functions that are returned are memcpy and one for bgra both of which are symmetric. Well, more to the point, they are their own inverse. I'll add a comment to intel_get_memcpy to that effect. In particular, I'll add the following comment to get_memcpy:<br><br> * The only two possible copy functions which are ever returned are a<br> * direct memcpy and a RGBA <-> BGRA copy function. Since RGBA -> BGRA and<br> * BGRA -> RGBA are exactly the same operation (and memcpy is obviously<br> * symmetric), it doesn't matter whether the copy is from the tiled image<br> * to the untiled or vice versa. The copy function required is the same in<br> * either case so this function can be used.<br><br></div><div>If you don't mind, I'll squash that in to the commit where we add the tiled_to_linear paths.<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class=""> <br> > +/**<br> > + * \brief A fast path for glGetTexImage.<br> > + *<br> > + * This fast path is taken when the source format is BGRA, RGBA,<br> > + * A or L and when the texture memory is X- or Y-tiled. It downloads<br> > + * the source data by directly mapping the memory without a GTT fence.<br> > + * This then needs to be de-tiled on the CPU before presenting the data to<br> > + * the user in the linear fasion.<br> > + *<br> > + * This is a performance win over the conventional texture download path.<br> > + * In the conventional texture download path, the texture is either mapped<br> > + * through the GTT or copied to a linear buffer with the blitter before<br> > + * handing off to a software path. This allows us to avoid round-tripping<br> > + * through the GPU (in the case where we would be blitting) and do only a<br> > + * single copy operation.<br> > + */<br> <br> </span>Rather than duplicating such a length comment block, I think it's better<br> to refer back to the original block with \see intel_readpixels_tiled_memcpy().<br></blockquote><div><br></div><div>Sure.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <span class=""><br> > +bool<br> > +intel_gettexsubimage_tiled_memcpy(struct gl_context *ctx,<br> > + struct gl_texture_image *texImage,<br> > + GLint xoffset, GLint yoffset,<br> > + GLsizei width, GLsizei height,<br> > + GLenum format, GLenum type,<br> > + GLvoid *pixels,<br> > + const struct gl_pixelstore_attrib *packing)<br> <br> </span>The signatures and bodies of intel_gettexsubimage_tiled_memcpy and<br> intel_readpixels_tiled_memcpy are nearly identical, yet the function<br> bodies are not trivial. The two functions should really share their<br> common code. As far as I can tell, the only differences between the<br> two function bodies are:<br> - intel_readpixels_tiled_memcpy pulls the miptree out of ctx->_ColorReadBuffer<br> - intel_gettexsubimage_tiled_memcpy pulls the miptree out of a texImage<br> - intel_gettexsubimage_tiled_memcpy does a few sanity checks on that texImage<br></blockquote><div><br></div><div>I looked over those again. Yes, they are similar and we could share some code but it's kind of tricky. We don't want to have too many "check some stuff" helper functions. One option would be to have another function with the tiled_memcpy functions that does a takes a miptree and handles checking the tiling, doing the resolve, mapping the buffer etc. However, I can't think of a good way to do the other packing/format checks in a shared function without running into "shared code for the sake of shared code".<br></div><div>--Jason<br></div></div><br></div></div>