<div dir="ltr">On 29 January 2013 00:36, Kenneth Graunke <span dir="ltr"><<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">The BLT engine has many limitations. Currently, it can only blit<br>
X-tiled buffers (since we don't have a kernel API to whack the BLT<br>
tiling mode register), which means all depth/stencil operations get<br>
punted to meta code, which can be very CPU-intensive.<br>
<br>
Even if we used the BLT engine, it can't blit between buffers with<br>
different tiling modes, such as an X-tiled non-MSAA ARGB8888 texture<br>
and a Y-tiled CMS ARGB8888 renderbuffer. This is a fundamental<br>
limitation, and the only way around that is to use BLORP.<br>
<br>
Previously, BLORP only handled BlitFramebuffer. This patch adds an<br>
additional frontend for doing CopyTexSubImage. It also makes it the<br>
default. This is partly to increase testing and avoid hiding bugs,<br>
and partly because the BLORP path can already handle more cases. With<br>
trivial extensions, it should be able to handle everything the BLT can.<br>
<br>
This helps PlaneShift massively, which tries to CopyTexSubImage2D<br>
between depth buffers whenever a player casts a spell. Since these<br>
are Y-tiled, we hit meta and software ReadPixels paths, eating 99% CPU<br>
while delivering ~1 FPS. This is particularly bad in an MMO setting<br>
because people cast spells all the time.<br>
<br>
It also helps Xonotic in 4X MSAA mode. At default power management<br>
settings, I measured a 6.35138% +/- 0.672548% performance boost (n=5).<br>
(This data is from v1 of the patch.)<br>
<br>
</div>No Piglit regressions on Ivybridge (v3) or Sandybridge (v2).<br>
<div class="im"><br>
v2: Create a fake intel_renderbuffer to wrap the destination texture<br>
image and then reuse do_blorp_blit rather than reimplementing most<br>
of it. Remove unnecessary clipping code and conditional rendering<br>
check.<br>
<br>
</div>v3: Reuse formats_match() to centralize checks; delete temporary<br>
renderbuffers. Reorganize the code.<br>
<br>
Signed-off-by: Kenneth Graunke <<a href="mailto:kenneth@whitecape.org">kenneth@whitecape.org</a>><br>
<div class="im">Cc: Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>><br>
Cc: Chad Versace <<a href="mailto:chad.versace@linux.intel.com">chad.versace@linux.intel.com</a>><br>
</div>Reviewed-and-tested-by: Carl Worth <<a href="mailto:cworth@cworth.org">cworth@cworth.org</a>> [v2]<br></blockquote><div><br></div><div>Should this be a candidate for the 9.1 branch?<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
---<br>
src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 73 ++++++++++++++++++++++++++++<br>
src/mesa/drivers/dri/i965/brw_context.h | 8 +++<br>
src/mesa/drivers/dri/intel/intel_fbo.c | 30 ++++++++++++<br>
src/mesa/drivers/dri/intel/intel_fbo.h | 4 ++<br>
src/mesa/drivers/dri/intel/intel_tex_copy.c | 32 +++++++++---<br>
5 files changed, 139 insertions(+), 8 deletions(-)<br>
<br>
diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp<br>
index bc7916a..b037156 100644<br>
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp<br>
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp<br>
@@ -23,6 +23,7 @@<br>
<br>
#include "main/teximage.h"<br>
#include "main/fbobject.h"<br>
+#include "main/renderbuffer.h"<br>
<br>
#include "glsl/ralloc.h"<br>
<br>
@@ -295,6 +296,78 @@ try_blorp_blit(struct intel_context *intel,<br>
<div class="im"> return true;<br>
}<br>
<br>
+bool<br>
+brw_blorp_copytexsubimage(struct intel_context *intel,<br>
+ struct gl_renderbuffer *src_rb,<br>
+ struct gl_texture_image *dst_image,<br>
+ int srcX0, int srcY0,<br>
+ int dstX0, int dstY0,<br>
+ int width, int height)<br>
+{<br>
+ struct gl_context *ctx = &intel->ctx;<br>
</div>+ struct intel_renderbuffer *src_irb = intel_renderbuffer(src_rb);<br>
+ struct intel_renderbuffer *dst_irb;<br>
<div class="im">+<br>
+ /* BLORP is not supported before Gen6. */<br>
+ if (intel->gen < 6)<br>
+ return false;<br>
+<br>
</div><div class="im">+ /* Create a fake/wrapper renderbuffer to allow us to use do_blorp_blit(). */<br>
</div>+ dst_irb = intel_create_fake_renderbuffer_wrapper(intel, dst_image);<br>
<div class="im">+ if (!dst_irb)<br>
+ return false;<br>
</div>+<br>
+ struct gl_renderbuffer *dst_rb = &dst_irb->Base.Base;<br>
+<br>
+ /* We don't really have a buffer bit, but at this point it's only used by<br>
+ * find_miptree() to decide whether to dereference the stencil miptree.<br>
+ * Since there are no stencil textures, we don't want to. 0 should work.<br>
+ */<br>
+ GLbitfield buffer_bit = 0;<br></blockquote><div><br></div><div>We just talked about this in person and concluded that this doesn't work. It's possible to have combined depth/stencil buffers, and since they're usually represented as separate buffers in the hardware, I think that means that in the depth/stencil case we actually need to do two blits.<br>
</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
+<br>
+ if (!formats_match(buffer_bit, src_irb, dst_irb)) {<br>
+ _mesa_delete_renderbuffer(ctx, dst_rb);<br>
<div class="im">+ return false;<br>
+ }<br>
+<br>
</div><div class="im">+ /* Source clipping shouldn't be necessary, since copytexsubimage (in<br>
+ * src/mesa/main/teximage.c) calls _mesa_clip_copytexsubimage() which<br>
+ * takes care of it.<br>
+ *<br>
+ * Destination clipping shouldn't be necessary since the restrictions on<br>
+ * glCopyTexSubImage prevent the user from specifying a destination rectangle<br>
+ * that falls outside the bounds of the destination texture.<br>
+ * See error_check_subtexture_dimensions().<br>
+ */<br>
+<br>
+ int srcY1 = srcY0 + height;<br>
+ int dstX1 = dstX0 + width;<br>
+ int dstY1 = dstY0 + height;<br>
+<br>
</div><div class="im">+ /* Sync up the state of window system buffers. We need to do this before<br>
+ * we go looking for the buffers.<br>
+ */<br>
+ intel_prepare_render(intel);<br>
+<br>
</div><div class="im">+ /* Account for the fact that in the system framebuffer, the origin is at<br>
+ * the lower left.<br>
+ */<br>
+ bool mirror_y = false;<br>
+ if (_mesa_is_winsys_fbo(ctx->ReadBuffer)) {<br>
+ GLint tmp = src_rb->Height - srcY0;<br>
+ srcY0 = src_rb->Height - srcY1;<br>
+ srcY1 = tmp;<br>
+ mirror_y = true;<br>
+ }<br>
+<br>
</div>+ do_blorp_blit(intel, buffer_bit, src_irb, dst_irb,<br>
<div class="im">+ srcX0, srcY0, dstX0, dstY0, dstX1, dstY1, false, mirror_y);<br>
+<br>
</div>+ _mesa_delete_renderbuffer(ctx, dst_rb);<br>
<div class="im">+ return true;<br>
+}<br>
+<br>
+<br>
GLbitfield<br>
brw_blorp_framebuffer(struct intel_context *intel,<br>
GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1,<br>
diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h<br>
</div>index 620f09f..324bb1d 100644<br>
--- a/src/mesa/drivers/dri/i965/brw_context.h<br>
+++ b/src/mesa/drivers/dri/i965/brw_context.h<br>
@@ -1217,6 +1217,14 @@ brw_blorp_framebuffer(struct intel_context *intel,<br>
<div class="im"> GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1,<br>
GLbitfield mask, GLenum filter);<br>
<br>
</div><div class="im">+bool<br>
+brw_blorp_copytexsubimage(struct intel_context *intel,<br>
+ struct gl_renderbuffer *src_rb,<br>
+ struct gl_texture_image *dst_image,<br>
+ int srcX0, int srcY0,<br>
+ int dstX0, int dstY0,<br>
+ int width, int height);<br>
+<br>
</div><div class="im"> /* gen6_multisample_state.c */<br>
void<br>
gen6_emit_3dstate_multisample(struct brw_context *brw,<br>
diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c<br>
index 4810809..37ecbd1 100644<br>
--- a/src/mesa/drivers/dri/intel/intel_fbo.c<br>
+++ b/src/mesa/drivers/dri/intel/intel_fbo.c<br>
@@ -531,6 +531,36 @@ intel_renderbuffer_update_wrapper(struct intel_context *intel,<br>
return true;<br>
}<br>
<br>
+/**<br>
+ * Create a fake intel_renderbuffer that wraps a gl_texture_image.<br>
+ */<br>
+struct intel_renderbuffer *<br>
+intel_create_fake_renderbuffer_wrapper(struct intel_context *intel,<br>
+ struct gl_texture_image *image)<br>
</div><div class="im">+{<br>
+ struct gl_context *ctx = &intel->ctx;<br>
</div><div><div class="h5">+ struct intel_renderbuffer *irb;<br>
+ struct gl_renderbuffer *rb;<br>
+<br>
+ irb = CALLOC_STRUCT(intel_renderbuffer);<br>
+ if (!irb) {<br>
+ _mesa_error(ctx, GL_OUT_OF_MEMORY, "creating renderbuffer");<br>
+ return NULL;<br>
+ }<br>
+<br>
+ rb = &irb->Base.Base;<br>
+<br>
+ _mesa_init_renderbuffer(rb, 0);<br>
+ rb->ClassID = INTEL_RB_CLASS;<br>
+<br>
+ if (!intel_renderbuffer_update_wrapper(intel, irb, image, image->Face)) {<br>
+ intel_delete_renderbuffer(ctx, rb);<br>
+ return NULL;<br>
+ }<br>
+<br>
+ return irb;<br>
+}<br>
+<br>
void<br>
intel_renderbuffer_set_draw_offset(struct intel_renderbuffer *irb)<br>
{<br>
diff --git a/src/mesa/drivers/dri/intel/intel_fbo.h b/src/mesa/drivers/dri/intel/intel_fbo.h<br>
index 9c48e9c..f135dea 100644<br>
--- a/src/mesa/drivers/dri/intel/intel_fbo.h<br>
+++ b/src/mesa/drivers/dri/intel/intel_fbo.h<br>
@@ -140,6 +140,10 @@ intel_create_wrapped_renderbuffer(struct gl_context * ctx,<br>
int width, int height,<br>
gl_format format);<br>
<br>
+struct intel_renderbuffer *<br>
+intel_create_fake_renderbuffer_wrapper(struct intel_context *intel,<br>
+ struct gl_texture_image *image);<br>
+<br>
extern void<br>
intel_fbo_init(struct intel_context *intel);<br>
<br>
diff --git a/src/mesa/drivers/dri/intel/intel_tex_copy.c b/src/mesa/drivers/dri/intel/intel_tex_copy.c<br>
</div></div>index c9cbcf4..5acdb42 100644<br>
<div class="im">--- a/src/mesa/drivers/dri/intel/intel_tex_copy.c<br>
+++ b/src/mesa/drivers/dri/intel/intel_tex_copy.c<br>
@@ -41,6 +41,9 @@<br>
#include "intel_fbo.h"<br>
#include "intel_tex.h"<br>
#include "intel_blit.h"<br>
+#ifndef I915<br>
+#include "brw_context.h"<br>
+#endif<br>
<br>
#define FILE_DEBUG_FLAG DEBUG_TEXTURE<br>
<br>
</div>@@ -177,15 +180,28 @@ intelCopyTexSubImage(struct gl_context *ctx, GLuint dims,<br>
<div><div> GLint x, GLint y,<br>
GLsizei width, GLsizei height)<br>
{<br>
- if (dims == 3 || !intel_copy_texsubimage(intel_context(ctx),<br>
- intel_texture_image(texImage),<br>
- xoffset, yoffset,<br>
- intel_renderbuffer(rb), x, y, width, height)) {<br>
- fallback_debug("%s - fallback to swrast\n", __FUNCTION__);<br>
- _mesa_meta_CopyTexSubImage(ctx, dims, texImage,<br>
- xoffset, yoffset, zoffset,<br>
- rb, x, y, width, height);<br>
+ struct intel_context *intel = intel_context(ctx);<br>
+ if (dims != 3) {<br>
+#ifndef I915<br>
+ /* Try BLORP first. It can handle almost everything. */<br>
+ if (brw_blorp_copytexsubimage(intel, rb, texImage, x, y,<br>
+ xoffset, yoffset, width, height))<br>
+ return;<br>
+#endif<br>
+<br>
+ /* Next, try the BLT engine. */<br>
+ if (intel_copy_texsubimage(intel_context(ctx),<br>
+ intel_texture_image(texImage),<br>
+ xoffset, yoffset,<br>
+ intel_renderbuffer(rb), x, y, width, height))<br>
+ return;<br>
} <br></div></div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">
+<br>
+ /* Finally, fall back to meta. This will likely be slow. */<br>
+ fallback_debug("%s - fallback to swrast\n", __FUNCTION__);<br>
+ _mesa_meta_CopyTexSubImage(ctx, dims, texImage,<br>
+ xoffset, yoffset, zoffset,<br>
+ rb, x, y, width, height);<br>
}<br>
<br>
<br>
--<br>
</div></div>1.8.1.2<br>
<br>
</blockquote></div><br></div></div>