[Mesa-dev] [PATCH] i965: Use the predicate enable bit for conditional rendering without stalling

Neil Roberts neil at linux.intel.com
Fri Nov 7 07:28:01 PST 2014


Previously whenever a primitive is drawn the driver would call
_mesa_check_conditional_render which blocks waiting for the result of the
query to determine whether to render. On Gen7+ there is a bit in the
3DPRIMITIVE command which can be used to disable the primitive based on the
value of a state bit. This state bit can be set based on whether two registers
have different values using the MI_PREDICATE command. We can load these two
registers with the pixel count values stored in the query begin and end to
implement conditional rendering without stalling.

Unfortunately these two source registers are not in the whitelist of available
registers in the kernel driver so this needs a kernel patch to work. This
patch tries to check whether it is possible to write to this register by
creating a DRM buffer with a single command to write to the register and then
trying to exec it. The return value of the exec function (and hence the ioctl)
is checked.

There is a function with a similar goal just above to check whether the
OACONTROL register can be used. This works by putting the test command in the
regular batch buffer and trying to compare whether the value was successfully
written. I couldn't get this approach to work because intel_batchbuffer_flush
actually calls exit() if the ioctl fails. I am suspicious that this approach
doesn't work for OACONTROL either.

The predicate enable bit is currently only used for drawing 3D primitives. For
blits, clears, bitmaps, copypixels and drawpixels it still causes a stall. For
most of these it would probably just work to call the new
brw_check_conditional_render function instead of
_mesa_check_conditional_render because they already work in terms of rendering
primitives. However it's a bit trickier for blits because it can use the BLT
ring or the blorp codepath. I think these operations are less useful for
conditional rendering than rendering primitives so it might be best to leave
it for a later patch.
---

I'm not really sure whether I'm using the right cache domain to load
the pixel count values into the predicate source values. Currently I'm
using the instruction domain because that is what is used to write the
values from a PIPE_CONTROL command but that doesn't seem to make much
sense because the comment for that domain says it is for shaders.

I've added the PIPE_CONTROL_FLUSH_ENABLE flag to the PIPE_CONTROL
command used to save the depth pass count because the PRM says that
you should do this. However it seems to work just as well without it
and I'm not really sure what it does. Perhaps without the bit it can
start the next command without waiting for the depth pass count to be
written and that would be bad because the next command is likely to be
the load into the predicate register. I'm not sure if it also implies
a cache flush because I would expect that the PIPE_CONTROL command and
the register load command would be using the same cache domain?

I'll post the kernel patch to the intel-gfx list.

I've also written a little demo using conditional rendering here:

https://github.com/bpeel/glthing/tree/wip/conditional-render

With the patch the FPS jumps from ~20 to ~50 although it's a bit of a
silly test because if you just completely disable conditional
rendering then the FPS goes all the way up to 270.

 src/mesa/drivers/dri/i965/Makefile.sources         |   1 +
 src/mesa/drivers/dri/i965/brw_conditional_render.c | 167 +++++++++++++++++++++
 src/mesa/drivers/dri/i965/brw_context.c            |   4 +
 src/mesa/drivers/dri/i965/brw_context.h            |  21 +++
 src/mesa/drivers/dri/i965/brw_defines.h            |   1 +
 src/mesa/drivers/dri/i965/brw_draw.c               |  16 +-
 src/mesa/drivers/dri/i965/brw_queryobj.c           |  17 ++-
 src/mesa/drivers/dri/i965/intel_extensions.c       |  48 ++++++
 src/mesa/drivers/dri/i965/intel_reg.h              |  23 +++
 9 files changed, 286 insertions(+), 12 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_conditional_render.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources
index 711aabe..0adaf4d 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -38,6 +38,7 @@ i965_FILES = \
 	brw_clip_tri.c \
 	brw_clip_unfilled.c \
 	brw_clip_util.c \
+	brw_conditional_render.c \
 	brw_context.c \
 	brw_cubemap_normalize.cpp \
 	brw_curbe.c \
diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c b/src/mesa/drivers/dri/i965/brw_conditional_render.c
new file mode 100644
index 0000000..e676aa0
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c
@@ -0,0 +1,167 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Neil Roberts <neil at linux.intel.com>
+ */
+
+/** @file brw_conditional_render.c
+ *
+ * Support for conditional rendering based on query objects
+ * (GL_NV_conditional_render, GL_ARB_conditional_render_inverted) on Gen7+.
+ */
+
+#include "main/imports.h"
+#include "main/condrender.h"
+
+#include "brw_context.h"
+#include "brw_defines.h"
+#include "intel_batchbuffer.h"
+
+static void
+set_predicate_enable(struct brw_context *brw,
+                     bool value)
+{
+   if (value)
+      brw->predicate.state = BRW_PREDICATE_STATE_RENDER;
+   else
+      brw->predicate.state = BRW_PREDICATE_STATE_DONT_RENDER;
+}
+
+static void
+load_64bit_register(struct brw_context *brw,
+                    uint32_t reg,
+                    drm_intel_bo *bo,
+                    uint32_t offset)
+{
+   int i;
+
+   for (i = 0; i < 2; i++) {
+      brw_load_register_mem(brw,
+                            reg + i * 4,
+                            bo,
+                            I915_GEM_DOMAIN_INSTRUCTION,
+                            0, /* write domain */
+                            offset + i * 4);
+   }
+}
+
+static void
+set_predicate_for_result(struct brw_context *brw,
+                         struct brw_query_object *query,
+                         bool inverted)
+{
+   int load_op;
+
+   assert(query->bo != NULL);
+
+   load_64bit_register(brw, MI_PREDICATE_SRC0, query->bo, 0);
+   load_64bit_register(brw, MI_PREDICATE_SRC1, query->bo, 8);
+
+   if (inverted)
+      load_op = MI_PREDICATE_LOADOP_LOAD;
+   else
+      load_op = MI_PREDICATE_LOADOP_LOADINV;
+
+   BEGIN_BATCH(1);
+   OUT_BATCH(GEN7_MI_PREDICATE |
+             load_op |
+             MI_PREDICATE_COMBINEOP_SET |
+             MI_PREDICATE_COMPAREOP_SRCS_EQUAL);
+   ADVANCE_BATCH();
+
+   brw->predicate.state = BRW_PREDICATE_STATE_USE_BIT;
+}
+
+static void
+brw_begin_conditional_render(struct gl_context *ctx,
+                             struct gl_query_object *q,
+                             GLenum mode)
+{
+   struct brw_context *brw = brw_context(ctx);
+   struct brw_query_object *query = (struct brw_query_object *) q;
+   bool inverted;
+
+   if (!brw->predicate.supported)
+      return;
+
+   switch (mode) {
+   case GL_QUERY_WAIT:
+   case GL_QUERY_NO_WAIT:
+   case GL_QUERY_BY_REGION_WAIT:
+   case GL_QUERY_BY_REGION_NO_WAIT:
+      inverted = false;
+      break;
+   case GL_QUERY_WAIT_INVERTED:
+   case GL_QUERY_NO_WAIT_INVERTED:
+   case GL_QUERY_BY_REGION_WAIT_INVERTED:
+   case GL_QUERY_BY_REGION_NO_WAIT_INVERTED:
+      inverted = true;
+      break;
+   default:
+      unreachable("Unexpected conditional render mode");
+   }
+
+   /* If there are already samples from a BLT operation or if the query object
+    * is ready then we can avoid looking at the values in the buffer and just
+    * decide whether to draw using the CPU without stalling */
+   if (query->Base.Result || query->Base.Ready)
+      set_predicate_enable(brw, (query->Base.Result != 0) ^ inverted);
+   else
+      set_predicate_for_result(brw, query, inverted);
+}
+
+static void
+brw_end_conditional_render(struct gl_context *ctx,
+                           struct gl_query_object *q)
+{
+   struct brw_context *brw = brw_context(ctx);
+
+   /* When there is no longer a conditional render in progress it should
+    * always render */
+   brw->predicate.state = BRW_PREDICATE_STATE_RENDER;
+}
+
+void
+brw_init_conditional_render_functions(struct dd_function_table *functions)
+{
+   functions->BeginConditionalRender = brw_begin_conditional_render;
+   functions->EndConditionalRender = brw_end_conditional_render;
+}
+
+bool
+brw_check_conditional_render(struct brw_context *brw)
+{
+   if (brw->predicate.supported) {
+      /* In some cases it is possible to determine that the primitives should
+       * be skipped without needing the predicate enable bit and still without
+       * stalling */
+      return brw->predicate.state != BRW_PREDICATE_STATE_DONT_RENDER;
+   } else {
+      if (brw->ctx.Query.CondRenderQuery) {
+         perf_debug("Conditional rendering is implemented in software and may "
+                    "stall.\n");
+      }
+
+      return _mesa_check_conditional_render(&brw->ctx);
+   }
+}
diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c
index e1a994a..52ffac0 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -284,6 +284,8 @@ brw_init_driver_functions(struct brw_context *brw,
       gen6_init_queryobj_functions(functions);
    else
       gen4_init_queryobj_functions(functions);
+   if (brw->gen >= 7)
+      brw_init_conditional_render_functions(functions);
 
    functions->QuerySamplesForFormat = brw_query_samples_for_format;
 
@@ -821,6 +823,8 @@ brwCreateContext(gl_api api,
    brw->gs.enabled = false;
    brw->sf.viewport_transform_enable = true;
 
+   brw->predicate.state = BRW_PREDICATE_STATE_RENDER;
+
    ctx->VertexProgram._MaintainTnlProgram = true;
    ctx->FragmentProgram._MaintainTexEnvProgram = true;
 
diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
index 656cbe8..553d38b 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -925,6 +925,18 @@ struct brw_stage_state
    uint32_t sampler_offset;
 };
 
+enum brw_predicate_state {
+   /* The first two states are used if we can determine whether to draw
+    * without having to look at the values in the query object buffer. This
+    * will happen if there is no conditional render in progress, if the query
+    * object is already completed or if something else has already added
+    * samples to the preliminary result such as via a BLT command. */
+   BRW_PREDICATE_STATE_RENDER,
+   BRW_PREDICATE_STATE_DONT_RENDER,
+   /* In this case whether to draw or not depends on the result of an
+    * MI_PREDICATE command so the predicate enable bit needs to be checked */
+   BRW_PREDICATE_STATE_USE_BIT
+};
 
 /**
  * brw_context is derived from gl_context.
@@ -1325,6 +1337,11 @@ struct brw_context
    } query;
 
    struct {
+      enum brw_predicate_state state;
+      bool supported;
+   } predicate;
+
+   struct {
       /** A map from pipeline statistics counter IDs to MMIO addresses. */
       const int *statistics_registers;
 
@@ -1520,6 +1537,10 @@ void brw_write_depth_count(struct brw_context *brw, drm_intel_bo *bo, int idx);
 void brw_store_register_mem64(struct brw_context *brw,
                               drm_intel_bo *bo, uint32_t reg, int idx);
 
+/** brw_conditional_render.c */
+void brw_init_conditional_render_functions(struct dd_function_table *functions);
+bool brw_check_conditional_render(struct brw_context *brw);
+
 /** intel_batchbuffer.c */
 void brw_load_register_mem(struct brw_context *brw,
                            uint32_t reg,
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h
index ee3d871..58e53fa 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -51,6 +51,7 @@
 # define GEN4_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL (0 << 15)
 # define GEN4_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM     (1 << 15)
 # define GEN7_3DPRIM_INDIRECT_PARAMETER_ENABLE      (1 << 10)
+# define GEN7_3DPRIM_PREDICATE_ENABLE               (1 << 8)
 /* DW1 */
 # define GEN7_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL (0 << 8)
 # define GEN7_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM     (1 << 8)
diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c
index b28eaf2..b6ee9b8 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -177,6 +177,7 @@ static void brw_emit_prim(struct brw_context *brw,
    int verts_per_instance;
    int vertex_access_type;
    int indirect_flag;
+   int predicate_enable;
 
    DBG("PRIM: %s %d %d\n", _mesa_lookup_enum_by_nr(prim->mode),
        prim->start, prim->count);
@@ -251,10 +252,14 @@ static void brw_emit_prim(struct brw_context *brw,
       indirect_flag = 0;
    }
 
-
    if (brw->gen >= 7) {
+      if (brw->predicate.state == BRW_PREDICATE_STATE_USE_BIT)
+         predicate_enable = GEN7_3DPRIM_PREDICATE_ENABLE;
+      else
+         predicate_enable = 0;
+
       BEGIN_BATCH(7);
-      OUT_BATCH(CMD_3D_PRIM << 16 | (7 - 2) | indirect_flag);
+      OUT_BATCH(CMD_3D_PRIM << 16 | (7 - 2) | indirect_flag | predicate_enable);
       OUT_BATCH(hw_prim | vertex_access_type);
    } else {
       BEGIN_BATCH(6);
@@ -524,12 +529,7 @@ void brw_draw_prims( struct gl_context *ctx,
 
    assert(unused_tfb_object == NULL);
 
-   if (ctx->Query.CondRenderQuery) {
-      perf_debug("Conditional rendering is implemented in software and may "
-                 "stall.  This should be fixed in the driver.\n");
-   }
-
-   if (!_mesa_check_conditional_render(ctx))
+   if (!brw_check_conditional_render(brw))
       return;
 
    /* Handle primitive restart if needed */
diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i965/brw_queryobj.c
index c053c34..ae2b7e5 100644
--- a/src/mesa/drivers/dri/i965/brw_queryobj.c
+++ b/src/mesa/drivers/dri/i965/brw_queryobj.c
@@ -66,14 +66,23 @@ brw_write_timestamp(struct brw_context *brw, drm_intel_bo *query_bo, int idx)
 void
 brw_write_depth_count(struct brw_context *brw, drm_intel_bo *query_bo, int idx)
 {
+   uint32_t flags;
+
    /* Emit Sandybridge workaround flush: */
    if (brw->gen == 6)
       intel_emit_post_sync_nonzero_flush(brw);
 
-   brw_emit_pipe_control_write(brw,
-                               PIPE_CONTROL_WRITE_DEPTH_COUNT
-                               | PIPE_CONTROL_DEPTH_STALL,
-                               query_bo, idx * sizeof(uint64_t), 0, 0);
+   flags = (PIPE_CONTROL_WRITE_DEPTH_COUNT |
+            PIPE_CONTROL_DEPTH_STALL);
+
+   /* Needed to ensure the memory is coherent for the MI_LOAD_REGISTER_MEM
+    * command when loading the values into the predicate source registers for
+    * conditional rendering */
+   if (brw->predicate.supported)
+      flags |= PIPE_CONTROL_FLUSH_ENABLE;
+
+   brw_emit_pipe_control_write(brw, flags, query_bo,
+                               idx * sizeof(uint64_t), 0, 0);
 }
 
 /**
diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c
index bbbb76f..392e488 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -153,6 +153,49 @@ can_write_oacontrol(struct brw_context *brw)
    return success;
 }
 
+static bool
+can_write_to_predicate_sources(struct brw_context *brw)
+{
+   static uint32_t command[] = {
+      MI_LOAD_REGISTER_IMM | (3 - 2),
+      MI_PREDICATE_SRC0,
+      0x42424242,
+      MI_BATCH_BUFFER_END
+   };
+   drm_intel_bo *bo;
+   int ret;
+
+   if (brw->gen < 7 || brw->hw_ctx == NULL)
+      return false;
+   if (brw->gen >= 8)
+      return true;
+
+   /* On Gen7 the kernel has a whitelist for registers that can be written to.
+    * If the register write is disallowed the execbuffer ioctl will fail. The
+    * predicate source registers were only added to the whitelist later on so
+    * we need to check for it. This just tries writing an immediate value. We
+    * can safely set one of the predicate source registers to anything at this
+    * point because it will always be reset to something else before it is
+    * used. We can't just use the normal batch buffer mecanhism because
+    * _intel_bactchbuffer_flush will call exit() if the ioctl fails.
+    */
+   bo = drm_intel_bo_alloc(brw->bufmgr, "batchbuffer", sizeof command, 4096);
+
+   if (bo == NULL)
+      return false;
+
+   drm_intel_bo_subdata(bo, 0, sizeof command, command);
+
+   ret = drm_intel_gem_bo_context_exec(bo,
+                                       brw->hw_ctx,
+                                       sizeof command,
+                                       I915_EXEC_RENDER);
+
+   drm_intel_bo_unreference(bo);
+
+   return ret == 0;
+}
+
 /**
  * Initializes potential list of extensions if ctx == NULL, or actually enables
  * extensions for a context.
@@ -286,6 +329,8 @@ intelInitExtensions(struct gl_context *ctx)
       ctx->Extensions.ARB_texture_query_levels = ctx->Const.GLSLVersion >= 130;
    }
 
+   brw->predicate.supported = false;
+
    if (brw->gen >= 7) {
       ctx->Extensions.ARB_conservative_depth = true;
       ctx->Extensions.ARB_texture_view = true;
@@ -294,6 +339,9 @@ intelInitExtensions(struct gl_context *ctx)
          ctx->Extensions.ARB_transform_feedback3 = true;
          ctx->Extensions.ARB_transform_feedback_instanced = true;
          ctx->Extensions.ARB_draw_indirect = true;
+
+         if (can_write_to_predicate_sources(brw))
+            brw->predicate.supported = true;
       }
 
       /* Only enable this in core profile because other parts of Mesa behave
diff --git a/src/mesa/drivers/dri/i965/intel_reg.h b/src/mesa/drivers/dri/i965/intel_reg.h
index 5ac0180..51bb180 100644
--- a/src/mesa/drivers/dri/i965/intel_reg.h
+++ b/src/mesa/drivers/dri/i965/intel_reg.h
@@ -48,6 +48,20 @@
 #define GEN7_MI_LOAD_REGISTER_MEM	(CMD_MI | (0x29 << 23))
 # define MI_LOAD_REGISTER_MEM_USE_GGTT		(1 << 22)
 
+/* Manipulate the predicate bit based on some register values. Only on Gen7+ */
+#define GEN7_MI_PREDICATE		(CMD_MI | (0xC << 23))
+# define MI_PREDICATE_LOADOP_KEEP		(0 << 6)
+# define MI_PREDICATE_LOADOP_LOAD		(2 << 6)
+# define MI_PREDICATE_LOADOP_LOADINV		(3 << 6)
+# define MI_PREDICATE_COMBINEOP_SET		(0 << 3)
+# define MI_PREDICATE_COMBINEOP_AND		(1 << 3)
+# define MI_PREDICATE_COMBINEOP_OR		(2 << 3)
+# define MI_PREDICATE_COMBINEOP_XOR		(3 << 3)
+# define MI_PREDICATE_COMPAREOP_TRUE		(0 << 0)
+# define MI_PREDICATE_COMPAREOP_FALSE		(1 << 0)
+# define MI_PREDICATE_COMPAREOP_SRCS_EQUAL	(2 << 0)
+# define MI_PREDICATE_COMPAREOP_DELTAS_EQUAL	(3 << 0)
+
 /** @{
  *
  * PIPE_CONTROL operation, a combination MI_FLUSH and register write with
@@ -69,6 +83,7 @@
 #define PIPE_CONTROL_TC_FLUSH		(1 << 10) /* GM45+ only */
 #define PIPE_CONTROL_ISP_DIS		(1 << 9)
 #define PIPE_CONTROL_INTERRUPT_ENABLE	(1 << 8)
+#define PIPE_CONTROL_FLUSH_ENABLE	(1 << 7) /* Gen7+ only */
 /* GT */
 #define PIPE_CONTROL_VF_CACHE_INVALIDATE	(1 << 4)
 #define PIPE_CONTROL_CONST_CACHE_INVALIDATE	(1 << 3)
@@ -144,3 +159,11 @@
 # define GEN8_HIZ_NP_EARLY_Z_FAILS_DISABLE (1 << 13)
 # define GEN8_HIZ_PMA_MASK_BITS \
    ((GEN8_HIZ_NP_PMA_FIX_ENABLE | GEN8_HIZ_NP_EARLY_Z_FAILS_DISABLE) << 16)
+
+/* Predicate registers */
+#define MI_PREDICATE_SRC0               0x2400
+#define MI_PREDICATE_SRC1               0x2408
+#define MI_PREDICATE_DATA               0x2410
+#define MI_PREDICATE_RESULT             0x2418
+#define MI_PREDICATE_RESULT_1           0x241C
+#define MI_PREDICATE_RESULT_2           0x2214
-- 
1.9.3



More information about the mesa-dev mailing list