[Mesa-dev] [PATCH 2/4] i965: Implement CopyTexSubImage2D via BLORP (and use it by default).

Paul Berry stereotype441 at gmail.com
Mon Feb 4 11:43:08 PST 2013


On 29 January 2013 00:36, Kenneth Graunke <kenneth at whitecape.org> wrote:

> The BLT engine has many limitations.  Currently, it can only blit
> X-tiled buffers (since we don't have a kernel API to whack the BLT
> tiling mode register), which means all depth/stencil operations get
> punted to meta code, which can be very CPU-intensive.
>
> Even if we used the BLT engine, it can't blit between buffers with
> different tiling modes, such as an X-tiled non-MSAA ARGB8888 texture
> and a Y-tiled CMS ARGB8888 renderbuffer.  This is a fundamental
> limitation, and the only way around that is to use BLORP.
>
> Previously, BLORP only handled BlitFramebuffer.  This patch adds an
> additional frontend for doing CopyTexSubImage.  It also makes it the
> default.  This is partly to increase testing and avoid hiding bugs,
> and partly because the BLORP path can already handle more cases.  With
> trivial extensions, it should be able to handle everything the BLT can.
>
> This helps PlaneShift massively, which tries to CopyTexSubImage2D
> between depth buffers whenever a player casts a spell.  Since these
> are Y-tiled, we hit meta and software ReadPixels paths, eating 99% CPU
> while delivering ~1 FPS.  This is particularly bad in an MMO setting
> because people cast spells all the time.
>
> It also helps Xonotic in 4X MSAA mode.  At default power management
> settings, I measured a 6.35138% +/- 0.672548% performance boost (n=5).
> (This data is from v1 of the patch.)
>
> No Piglit regressions on Ivybridge (v3) or Sandybridge (v2).
>
> v2: Create a fake intel_renderbuffer to wrap the destination texture
>     image and then reuse do_blorp_blit rather than reimplementing most
>     of it.  Remove unnecessary clipping code and conditional rendering
>     check.
>
> v3: Reuse formats_match() to centralize checks; delete temporary
>     renderbuffers.  Reorganize the code.
>
> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> Cc: Paul Berry <stereotype441 at gmail.com>
> Cc: Chad Versace <chad.versace at linux.intel.com>
> Reviewed-and-tested-by: Carl Worth <cworth at cworth.org> [v2]
>

Should this be a candidate for the 9.1 branch?


> ---
>  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 73
> ++++++++++++++++++++++++++++
>  src/mesa/drivers/dri/i965/brw_context.h      |  8 +++
>  src/mesa/drivers/dri/intel/intel_fbo.c       | 30 ++++++++++++
>  src/mesa/drivers/dri/intel/intel_fbo.h       |  4 ++
>  src/mesa/drivers/dri/intel/intel_tex_copy.c  | 32 +++++++++---
>  5 files changed, 139 insertions(+), 8 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> index bc7916a..b037156 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> @@ -23,6 +23,7 @@
>
>  #include "main/teximage.h"
>  #include "main/fbobject.h"
> +#include "main/renderbuffer.h"
>
>  #include "glsl/ralloc.h"
>
> @@ -295,6 +296,78 @@ try_blorp_blit(struct intel_context *intel,
>     return true;
>  }
>
> +bool
> +brw_blorp_copytexsubimage(struct intel_context *intel,
> +                          struct gl_renderbuffer *src_rb,
> +                          struct gl_texture_image *dst_image,
> +                          int srcX0, int srcY0,
> +                          int dstX0, int dstY0,
> +                          int width, int height)
> +{
> +   struct gl_context *ctx = &intel->ctx;
> +   struct intel_renderbuffer *src_irb = intel_renderbuffer(src_rb);
> +   struct intel_renderbuffer *dst_irb;
> +
> +   /* BLORP is not supported before Gen6. */
> +   if (intel->gen < 6)
> +      return false;
> +
> +   /* Create a fake/wrapper renderbuffer to allow us to use
> do_blorp_blit(). */
> +   dst_irb = intel_create_fake_renderbuffer_wrapper(intel, dst_image);
> +   if (!dst_irb)
> +      return false;
> +
> +   struct gl_renderbuffer *dst_rb = &dst_irb->Base.Base;
> +
> +   /* We don't really have a buffer bit, but at this point it's only used
> by
> +    * find_miptree() to decide whether to dereference the stencil miptree.
> +    * Since there are no stencil textures, we don't want to.  0 should
> work.
> +    */
> +   GLbitfield buffer_bit = 0;
>

We just talked about this in person and concluded that this doesn't work.
It's possible to have combined depth/stencil buffers, and since they're
usually represented as separate buffers in the hardware, I think that means
that in the depth/stencil case we actually need to do two blits.


> +
> +   if (!formats_match(buffer_bit, src_irb, dst_irb)) {
> +      _mesa_delete_renderbuffer(ctx, dst_rb);
> +      return false;
> +   }
> +
> +   /* Source clipping shouldn't be necessary, since copytexsubimage (in
> +    * src/mesa/main/teximage.c) calls _mesa_clip_copytexsubimage() which
> +    * takes care of it.
> +    *
> +    * Destination clipping shouldn't be necessary since the restrictions
> on
> +    * glCopyTexSubImage prevent the user from specifying a destination
> rectangle
> +    * that falls outside the bounds of the destination texture.
> +    * See error_check_subtexture_dimensions().
> +    */
> +
> +   int srcY1 = srcY0 + height;
> +   int dstX1 = dstX0 + width;
> +   int dstY1 = dstY0 + height;
> +
> +   /* Sync up the state of window system buffers.  We need to do this
> before
> +    * we go looking for the buffers.
> +    */
> +   intel_prepare_render(intel);
> +
> +   /* Account for the fact that in the system framebuffer, the origin is
> at
> +    * the lower left.
> +    */
> +   bool mirror_y = false;
> +   if (_mesa_is_winsys_fbo(ctx->ReadBuffer)) {
> +      GLint tmp = src_rb->Height - srcY0;
> +      srcY0 = src_rb->Height - srcY1;
> +      srcY1 = tmp;
> +      mirror_y = true;
> +   }
> +
> +   do_blorp_blit(intel, buffer_bit, src_irb, dst_irb,
> +                 srcX0, srcY0, dstX0, dstY0, dstX1, dstY1, false,
> mirror_y);
> +
> +   _mesa_delete_renderbuffer(ctx, dst_rb);
> +   return true;
> +}
> +
> +
>  GLbitfield
>  brw_blorp_framebuffer(struct intel_context *intel,
>                        GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1,
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h
> b/src/mesa/drivers/dri/i965/brw_context.h
> index 620f09f..324bb1d 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1217,6 +1217,14 @@ brw_blorp_framebuffer(struct intel_context *intel,
>                        GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1,
>                        GLbitfield mask, GLenum filter);
>
> +bool
> +brw_blorp_copytexsubimage(struct intel_context *intel,
> +                          struct gl_renderbuffer *src_rb,
> +                          struct gl_texture_image *dst_image,
> +                          int srcX0, int srcY0,
> +                          int dstX0, int dstY0,
> +                          int width, int height);
> +
>  /* gen6_multisample_state.c */
>  void
>  gen6_emit_3dstate_multisample(struct brw_context *brw,
> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c
> b/src/mesa/drivers/dri/intel/intel_fbo.c
> index 4810809..37ecbd1 100644
> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
> @@ -531,6 +531,36 @@ intel_renderbuffer_update_wrapper(struct
> intel_context *intel,
>     return true;
>  }
>
> +/**
> + * Create a fake intel_renderbuffer that wraps a gl_texture_image.
> + */
> +struct intel_renderbuffer *
> +intel_create_fake_renderbuffer_wrapper(struct intel_context *intel,
> +                                       struct gl_texture_image *image)
> +{
> +   struct gl_context *ctx = &intel->ctx;
> +   struct intel_renderbuffer *irb;
> +   struct gl_renderbuffer *rb;
> +
> +   irb = CALLOC_STRUCT(intel_renderbuffer);
> +   if (!irb) {
> +      _mesa_error(ctx, GL_OUT_OF_MEMORY, "creating renderbuffer");
> +      return NULL;
> +   }
> +
> +   rb = &irb->Base.Base;
> +
> +   _mesa_init_renderbuffer(rb, 0);
> +   rb->ClassID = INTEL_RB_CLASS;
> +
> +   if (!intel_renderbuffer_update_wrapper(intel, irb, image,
> image->Face)) {
> +      intel_delete_renderbuffer(ctx, rb);
> +      return NULL;
> +   }
> +
> +   return irb;
> +}
> +
>  void
>  intel_renderbuffer_set_draw_offset(struct intel_renderbuffer *irb)
>  {
> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.h
> b/src/mesa/drivers/dri/intel/intel_fbo.h
> index 9c48e9c..f135dea 100644
> --- a/src/mesa/drivers/dri/intel/intel_fbo.h
> +++ b/src/mesa/drivers/dri/intel/intel_fbo.h
> @@ -140,6 +140,10 @@ intel_create_wrapped_renderbuffer(struct gl_context *
> ctx,
>                                   int width, int height,
>                                   gl_format format);
>
> +struct intel_renderbuffer *
> +intel_create_fake_renderbuffer_wrapper(struct intel_context *intel,
> +                                       struct gl_texture_image *image);
> +
>  extern void
>  intel_fbo_init(struct intel_context *intel);
>
> diff --git a/src/mesa/drivers/dri/intel/intel_tex_copy.c
> b/src/mesa/drivers/dri/intel/intel_tex_copy.c
> index c9cbcf4..5acdb42 100644
> --- a/src/mesa/drivers/dri/intel/intel_tex_copy.c
> +++ b/src/mesa/drivers/dri/intel/intel_tex_copy.c
> @@ -41,6 +41,9 @@
>  #include "intel_fbo.h"
>  #include "intel_tex.h"
>  #include "intel_blit.h"
> +#ifndef I915
> +#include "brw_context.h"
> +#endif
>
>  #define FILE_DEBUG_FLAG DEBUG_TEXTURE
>
> @@ -177,15 +180,28 @@ intelCopyTexSubImage(struct gl_context *ctx, GLuint
> dims,
>                       GLint x, GLint y,
>                       GLsizei width, GLsizei height)
>  {
> -   if (dims == 3 || !intel_copy_texsubimage(intel_context(ctx),
> -                               intel_texture_image(texImage),
> -                               xoffset, yoffset,
> -                               intel_renderbuffer(rb), x, y, width,
> height)) {
> -      fallback_debug("%s - fallback to swrast\n", __FUNCTION__);
> -      _mesa_meta_CopyTexSubImage(ctx, dims, texImage,
> -                                 xoffset, yoffset, zoffset,
> -                                 rb, x, y, width, height);
> +   struct intel_context *intel = intel_context(ctx);
> +   if (dims != 3) {
> +#ifndef I915
> +      /* Try BLORP first.  It can handle almost everything. */
> +      if (brw_blorp_copytexsubimage(intel, rb, texImage, x, y,
> +                                    xoffset, yoffset, width, height))
> +         return;
> +#endif
> +
> +      /* Next, try the BLT engine. */
> +      if (intel_copy_texsubimage(intel_context(ctx),
> +                                 intel_texture_image(texImage),
> +                                 xoffset, yoffset,
> +                                 intel_renderbuffer(rb), x, y, width,
> height))
> +         return;
>     }
>
+
> +   /* Finally, fall back to meta.  This will likely be slow. */
> +   fallback_debug("%s - fallback to swrast\n", __FUNCTION__);
> +   _mesa_meta_CopyTexSubImage(ctx, dims, texImage,
> +                              xoffset, yoffset, zoffset,
> +                              rb, x, y, width, height);
>  }
>
>
> --
> 1.8.1.2
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130204/967bf57b/attachment-0001.html>


More information about the mesa-dev mailing list