[Mesa-dev] [PATCH 4/4] i965: Implemente a tiled fast-path for glReadPixels and glGetTexImage
Ben Widawsky
ben at bwidawsk.net
Sun Jan 4 13:07:16 PST 2015
I just did a very cursory review. I assume someone smarter than me will do a
real review, but if not, feel free to ping me.
I think all the comments apply to both functions.
On Sat, Jan 03, 2015 at 11:54:15AM -0800, Jason Ekstrand wrote:
> From: Sisinty Sasmita Patra <sisinty.patra at intel.com>
>
> Added intel_readpixels_tiled_mempcpy and intel_gettexsubimage_tiled_mempcpy
> functions. These are the fast paths for glReadPixels and glGetTexImage.
>
> v2: Jason Ekstrand <jason.ekstrand at intel.com>
> - Refactor to make the functions look more like the old
> intel_tex_subimage_tiled_memcpy
> - Don't export the readpixels_tiled_memcpy function
> - Fix some pointer arithmatic bugs in partial image downloads (using
> ReadPixels with a non-zero x or y offset)
> - Fix a bug when ReadPixels is performed on an FBO wrapping a texture
> miplevel other than zero.
>
> Signed-off-by: Jason Ekstrand <jason.ekstrand at intel.com>
> ---
> src/mesa/drivers/dri/i965/intel_pixel_read.c | 134 +++++++++++++++++++++++++-
> src/mesa/drivers/dri/i965/intel_tex.h | 9 ++
> src/mesa/drivers/dri/i965/intel_tex_image.c | 137 ++++++++++++++++++++++++++-
> 3 files changed, 277 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_pixel_read.c b/src/mesa/drivers/dri/i965/intel_pixel_read.c
> index beb3152..d1e7798 100644
> --- a/src/mesa/drivers/dri/i965/intel_pixel_read.c
> +++ b/src/mesa/drivers/dri/i965/intel_pixel_read.c
> @@ -38,14 +38,16 @@
>
> #include "brw_context.h"
> #include "intel_screen.h"
> +#include "intel_batchbuffer.h"
> #include "intel_blit.h"
> #include "intel_buffers.h"
> #include "intel_fbo.h"
> #include "intel_mipmap_tree.h"
> #include "intel_pixel.h"
> #include "intel_buffer_objects.h"
> +#include "intel_tiled_memcpy.h"
>
> -#define FILE_DEBUG_FLAG DEBUG_PIXEL
> +#define FILE_DEBUG_FLAG DEBUG_TEXTURE
>
> /* For many applications, the new ability to pull the source buffers
> * back out of the GTT and then do the packing/conversion operations
> @@ -161,17 +163,147 @@ do_blit_readpixels(struct gl_context * ctx,
> return true;
> }
>
> +/**
> + * \brief A fast path for glReadPixels
> + *
> + * This fast path is taken when the source format is BGRA, RGBA,
> + * A or L and when the texture memory is X- or Y-tiled. It downloads
> + * the source data by mapping the memory without a GTT fence, thus
> + * acquiring a linear view of the memory.
That last sentence is confusing since you're using linear differently than just
about everywhere else (though the statement is accurate).
How about something like, "It maps the source data with a CPU mapping which then
needs to be de-tiled [by the CPU] before presenting linear data back to the
user."
> + *
> + * This is a performance win over the conventional texture download path.
> + * In the conventional texture download path,
> + *
> + */
> +
> +static bool
> +intel_readpixels_tiled_memcpy(struct gl_context * ctx,
> + GLint xoffset, GLint yoffset,
> + GLsizei width, GLsizei height,
> + GLenum format, GLenum type,
> + GLvoid * pixels,
> + const struct gl_pixelstore_attrib *pack)
> +{
> + struct brw_context *brw = brw_context(ctx);
> + struct gl_renderbuffer *rb = ctx->ReadBuffer->_ColorReadBuffer;
> +
> + /* This path supports reading from color buffers only */
> + if (rb == NULL)
> + return false;
> +
> + struct intel_renderbuffer *irb = intel_renderbuffer(rb);
> + int dst_pitch;
> +
> + /* The miptree's buffer. */
> + drm_intel_bo *bo;
> +
> + int error = 0;
> +
> + uint32_t cpp;
> + mem_copy_fn mem_copy = NULL;
> +
> + /* This fastpath is restricted to specific renderbuffer types:
> + * a 2D BGRA, RGBA, L8 or A8 texture. It could be generalized to support
> + * more types.
> + */
> + if (!brw->has_llc ||
> + !(type == GL_UNSIGNED_BYTE || type == GL_UNSIGNED_INT_8_8_8_8_REV) ||
> + pixels == NULL ||
> + _mesa_is_bufferobj(pack->BufferObj) ||
> + pack->Alignment > 4 ||
> + pack->SkipPixels > 0 ||
> + pack->SkipRows > 0 ||
> + (pack->RowLength != 0 && pack->RowLength != width) ||
> + pack->SwapBytes ||
> + pack->LsbFirst ||
> + pack->Invert)
> + return false;
> +
> + if (!intel_get_memcpy(rb->Format, format, type, &mem_copy, &cpp))
> + return false;
> +
> + if (!irb->mt ||
> + (irb->mt->tiling != I915_TILING_X &&
> + irb->mt->tiling != I915_TILING_Y)) {
> + /* The algorithm is written only for X- or Y-tiled memory. */
> + return false;
> + }
> +
> + /* Since we are going to read raw data to the miptree, we need to resolve
> + * any pending fast color clears before we start.
> + */
> + intel_miptree_resolve_color(brw, irb->mt);
> +
> + bo = irb->mt->bo;
> +
> + if (drm_intel_bo_references(brw->batch.bo, bo)) {
> + perf_debug("Flushing before mapping a referenced bo.\n");
> + intel_batchbuffer_flush(brw);
> + }
I don't know much about the spec, but is this even possible for glReadPixels?
> +
> + error = brw_bo_map(brw, bo, false /* write enable */, "miptree");
> + if (error || bo->virtual == NULL) {
I realize this was probably just copied from the original code, but it should be
superfluous (as is the comment about write enable). Not really sure why Eric
initially added it.
> + DBG("%s: failed to map bo\n", __FUNCTION__);
> + return false;
> + }
> +
> + dst_pitch = _mesa_image_row_stride(pack, width, format, type);
> +
> + /* We postponed printing this message until having committed to executing
> + * the function.
> + */
> + DBG("%s: x,y=(%d,%d) (w,h)=(%d,%d) format=0x%x type=0x%x "
> + "mesa_format=0x%x tiling=%d "
> + "pack=(alignment=%d row_length=%d skip_pixels=%d skip_rows=%d)\n",
> + __FUNCTION__, xoffset, yoffset, width, height,
> + format, type, rb->Format, irb->mt->tiling,
> + pack->Alignment, pack->RowLength, pack->SkipPixels,
> + pack->SkipRows);
> +
> + /* This renderbuffer can come from a texture which may not be miplevel 0.
> + * We need to adjust accordingly.
> + */
> + if (rb->TexImage) {
> + int level = rb->TexImage->Level + rb->TexImage->TexObject->MinLevel;
> +
> + /* Adjust x and y offset based on miplevel */
> + xoffset += irb->mt->level[level].level_x;
> + yoffset += irb->mt->level[level].level_y;
> + }
Can the slice be non-zero? Should this be using
intel_miptree_get_image_offset() instead?
> +
> + tiled_to_linear(
> + xoffset * cpp, (xoffset + width) * cpp,
> + yoffset, yoffset + height,
> + pixels - (ptrdiff_t) yoffset * dst_pitch - (ptrdiff_t) xoffset * cpp,
> + bo->virtual,
> + dst_pitch, irb->mt->pitch,
> + brw->has_swizzling,
> + irb->mt->tiling,
> + mem_copy
> + );
> +
> + drm_intel_bo_unmap(bo);
> + return true;
> +}
> +
> void
> intelReadPixels(struct gl_context * ctx,
> GLint x, GLint y, GLsizei width, GLsizei height,
> GLenum format, GLenum type,
> const struct gl_pixelstore_attrib *pack, GLvoid * pixels)
> {
> + bool ok;
> +
> struct brw_context *brw = brw_context(ctx);
> bool dirty;
>
> DBG("%s\n", __FUNCTION__);
>
> + ok = intel_readpixels_tiled_memcpy(ctx, x, y, width, height,
> + format, type, pixels, pack);
> + if(ok)
> + return;
> +
> if (_mesa_is_bufferobj(pack->BufferObj)) {
> /* Using PBOs, so try the BLT based path. */
> if (do_blit_readpixels(ctx, x, y, width, height, format, type, pack,
> diff --git a/src/mesa/drivers/dri/i965/intel_tex.h b/src/mesa/drivers/dri/i965/intel_tex.h
> index 27f7f11..f048e84 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex.h
> +++ b/src/mesa/drivers/dri/i965/intel_tex.h
> @@ -68,4 +68,13 @@ intel_texsubimage_tiled_memcpy(struct gl_context *ctx,
> const struct gl_pixelstore_attrib *packing,
> bool for_glTexImage);
>
> +bool
> +intel_gettexsubimage_tiled_memcpy(struct gl_context *ctx,
> + struct gl_texture_image *texImage,
> + GLint xoffset, GLint yofset,
> + GLsizei width, GLsizei height,
> + GLenum format, GLenum type,
> + GLvoid *pixels,
> + const struct gl_pixelstore_attrib *packing);
> +
> #endif
> diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c b/src/mesa/drivers/dri/i965/intel_tex_image.c
> index 3317779..075315d 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex_image.c
> +++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
> @@ -24,7 +24,7 @@
> #include "intel_blit.h"
> #include "intel_fbo.h"
> #include "intel_image.h"
> -
> +#include "intel_tiled_memcpy.h"
> #include "brw_context.h"
>
> #define FILE_DEBUG_FLAG DEBUG_TEXTURE
> @@ -507,12 +507,145 @@ blit_texture_to_pbo(struct gl_context *ctx,
> return true;
> }
>
> +/**
> + * \brief A fast path for glGetTexImage.
> + *
> + *
> + * This fast path is taken when the texture format is BGRA, RGBA,
> + * A or L and when the texture memory is X- or Y-tiled. It downloads
> + * the texture data by mapping the texture memory without a GTT fence, thus
> + * acquiring a linear view of the memory.
> + *
> + * This is a performance win over the conventional texture download path.
> + * In the conventional texture download path,
> + *
> + */
> +
> +bool
> +intel_gettexsubimage_tiled_memcpy(struct gl_context *ctx,
> + struct gl_texture_image *texImage,
> + GLint xoffset, GLint yoffset,
> + GLsizei width, GLsizei height,
> + GLenum format, GLenum type,
> + GLvoid *pixels,
> + const struct gl_pixelstore_attrib *packing)
> +{
> + struct brw_context *brw = brw_context(ctx);
> + struct intel_texture_image *image = intel_texture_image(texImage);
> + int dst_pitch;
> +
> + /* The miptree's buffer. */
> + drm_intel_bo *bo;
> +
> + int error = 0;
> +
> + uint32_t cpp;
> + mem_copy_fn mem_copy = NULL;
> +
> + /* This fastpath is restricted to specific texture types:
> + * a 2D BGRA, RGBA, L8 or A8 texture. It could be generalized to support
> + * more types.
> + *
> + * FINISHME: The restrictions below on packing alignment and packing row
> + * length are likely unneeded now because we calculate the destination stride
> + * with _mesa_image_row_stride. However, before removing the restrictions
> + * we need tests.
> + */
> + if (!brw->has_llc ||
> + !(type == GL_UNSIGNED_BYTE || type == GL_UNSIGNED_INT_8_8_8_8_REV) ||
> + !(texImage->TexObject->Target == GL_TEXTURE_2D ||
> + texImage->TexObject->Target == GL_TEXTURE_RECTANGLE) ||
> + pixels == NULL ||
> + _mesa_is_bufferobj(packing->BufferObj) ||
> + packing->Alignment > 4 ||
> + packing->SkipPixels > 0 ||
> + packing->SkipRows > 0 ||
> + (packing->RowLength != 0 && packing->RowLength != width) ||
> + packing->SwapBytes ||
> + packing->LsbFirst ||
> + packing->Invert)
> + return false;
> +
> + if (!intel_get_memcpy(texImage->TexFormat, format, type, &mem_copy, &cpp))
> + return false;
> +
> + /* If this is a nontrivial texture view, let another path handle it instead. */
> + if (texImage->TexObject->MinLayer)
> + return false;
> +
> + if (!image->mt ||
> + (image->mt->tiling != I915_TILING_X &&
> + image->mt->tiling != I915_TILING_Y)) {
> + /* The algorithm is written only for X- or Y-tiled memory. */
> + return false;
> + }
> +
> + /* Since we are going to write raw data to the miptree, we need to resolve
> + * any pending fast color clears before we start.
> + */
> + intel_miptree_resolve_color(brw, image->mt);
> +
> + bo = image->mt->bo;
> +
> + if (drm_intel_bo_references(brw->batch.bo, bo)) {
> + perf_debug("Flushing before mapping a referenced bo.\n");
> + intel_batchbuffer_flush(brw);
> + }
> +
> + error = brw_bo_map(brw, bo, false /* write enable */, "miptree");
> + if (error || bo->virtual == NULL) {
> + DBG("%s: failed to map bo\n", __FUNCTION__);
> + return false;
> + }
> +
> + dst_pitch = _mesa_image_row_stride(packing, width, format, type);
> +
> + DBG("%s: level=%d x,y=(%d,%d) (w,h)=(%d,%d) format=0x%x type=0x%x "
> + "mesa_format=0x%x tiling=%d "
> + "packing=(alignment=%d row_length=%d skip_pixels=%d skip_rows=%d)\n",
> + __FUNCTION__, texImage->Level, xoffset, yoffset, width, height,
> + format, type, texImage->TexFormat, image->mt->tiling,
> + packing->Alignment, packing->RowLength, packing->SkipPixels,
> + packing->SkipRows);
> +
> + int level = texImage->Level + texImage->TexObject->MinLevel;
> +
> + /* Adjust x and y offset based on miplevel */
> + xoffset += image->mt->level[level].level_x;
> + yoffset += image->mt->level[level].level_y;
> +
> + tiled_to_linear(
> + xoffset * cpp, (xoffset + width) * cpp,
> + yoffset, yoffset + height,
> + pixels - (ptrdiff_t) yoffset * dst_pitch - (ptrdiff_t) xoffset * cpp,
> + bo->virtual,
> + dst_pitch, image->mt->pitch,
> + brw->has_swizzling,
> + image->mt->tiling,
> + mem_copy
> + );
> +
> + drm_intel_bo_unmap(bo);
> + return true;
> +}
> +
> +
> static void
> intel_get_tex_image(struct gl_context *ctx,
> GLenum format, GLenum type, GLvoid *pixels,
> - struct gl_texture_image *texImage) {
> + struct gl_texture_image *texImage)
> +{
> + bool ok;
> +
> DBG("%s\n", __FUNCTION__);
>
> + ok = intel_gettexsubimage_tiled_memcpy(ctx, texImage, 0, 0,
> + texImage->Width, texImage->Height,
> + format, type, pixels, &ctx->Pack);
> +
> + if(ok)
> + return;
> +
> if (_mesa_is_bufferobj(ctx->Pack.BufferObj)) {
> /* Using PBOs, so try the BLT based path. */
> if (blit_texture_to_pbo(ctx, format, type, pixels, texImage))
> --
> 2.2.0
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
--
Ben Widawsky, Intel Open Source Technology Center
More information about the mesa-dev
mailing list