[Mesa-dev] [PATCH 2/2 v3] i965: Use predicate enable bit for conditional rendering w/o stalling
Kenneth Graunke
kenneth at whitecape.org
Fri May 8 11:48:50 PDT 2015
On Friday, May 08, 2015 07:04:31 PM Neil Roberts wrote:
> Previously whenever a primitive is drawn the driver would call
> _mesa_check_conditional_render which blocks waiting for the result of
> the query to determine whether to render. On Gen7+ there is a bit in
> the 3DPRIMITIVE command which can be used to disable the primitive
> based on the value of a state bit. This state bit can be set based on
> whether two registers have different values using the MI_PREDICATE
> command. We can load these two registers with the pixel count values
> stored in the query begin and end to implement conditional rendering
> without stalling.
>
> Unfortunately these two source registers were not in the whitelist of
> available registers in the kernel driver until v3.19. This patch uses
> the command parser version from intel_screen to detect whether to
> attempt to set the predicate data registers.
>
> The predicate enable bit is currently only used for drawing 3D
> primitives. For blits, clears, bitmaps, copypixels and drawpixels it
> still causes a stall. For most of these it would probably just work to
> call the new brw_check_conditional_render function instead of
> _mesa_check_conditional_render because they already work in terms of
> rendering primitives. However it's a bit trickier for blits because it
> can use the BLT ring or the blorp codepath. I think these operations
> are less useful for conditional rendering than rendering primitives so
> it might be best to leave it for a later patch.
>
> v2: Use the command parser version to detect whether we can write to
> the predicate data registers instead of trying to execute a
> register load command.
> v3: Simple rebase
> ---
> src/mesa/drivers/dri/i965/Makefile.sources | 1 +
> src/mesa/drivers/dri/i965/brw_conditional_render.c | 167 +++++++++++++++++++++
> src/mesa/drivers/dri/i965/brw_context.c | 4 +
> src/mesa/drivers/dri/i965/brw_context.h | 21 +++
> src/mesa/drivers/dri/i965/brw_defines.h | 1 +
> src/mesa/drivers/dri/i965/brw_draw.c | 16 +-
> src/mesa/drivers/dri/i965/brw_queryobj.c | 17 ++-
> src/mesa/drivers/dri/i965/intel_extensions.c | 5 +
> src/mesa/drivers/dri/i965/intel_reg.h | 23 +++
> 9 files changed, 243 insertions(+), 12 deletions(-)
> create mode 100644 src/mesa/drivers/dri/i965/brw_conditional_render.c
>
> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources
> index 1ae93e1..a24c20a 100644
> --- a/src/mesa/drivers/dri/i965/Makefile.sources
> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
> @@ -18,6 +18,7 @@ i965_FILES = \
> brw_clip_unfilled.c \
> brw_clip_util.c \
> brw_compute.c \
> + brw_conditional_render.c \
> brw_context.c \
> brw_context.h \
> brw_cs.cpp \
> diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c b/src/mesa/drivers/dri/i965/brw_conditional_render.c
> new file mode 100644
> index 0000000..e676aa0
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c
> @@ -0,0 +1,167 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Authors:
> + * Neil Roberts <neil at linux.intel.com>
> + */
> +
> +/** @file brw_conditional_render.c
> + *
> + * Support for conditional rendering based on query objects
> + * (GL_NV_conditional_render, GL_ARB_conditional_render_inverted) on Gen7+.
> + */
> +
> +#include "main/imports.h"
> +#include "main/condrender.h"
> +
> +#include "brw_context.h"
> +#include "brw_defines.h"
> +#include "intel_batchbuffer.h"
> +
> +static void
> +set_predicate_enable(struct brw_context *brw,
> + bool value)
> +{
> + if (value)
> + brw->predicate.state = BRW_PREDICATE_STATE_RENDER;
> + else
> + brw->predicate.state = BRW_PREDICATE_STATE_DONT_RENDER;
> +}
> +
> +static void
> +load_64bit_register(struct brw_context *brw,
> + uint32_t reg,
> + drm_intel_bo *bo,
> + uint32_t offset)
> +{
> + int i;
> +
> + for (i = 0; i < 2; i++) {
> + brw_load_register_mem(brw,
> + reg + i * 4,
> + bo,
> + I915_GEM_DOMAIN_INSTRUCTION,
> + 0, /* write domain */
> + offset + i * 4);
> + }
> +}
It might be nice to create a brw_load_register_mem64 function, for
symmetry with brw_store_register_mem64 - we might want to reuse it
elsewhere someday.
One interesting quirk: the two halves of your register write may land
in two separate batchbuffers, since it's done with two BEGIN_BATCH /
ADVANCE_BATCH blocks (each of which only reserves space for one LRM).
I don't think that's harmful here, but it's certainly weird. It might
be better to open code it in one big BEGIN/ADVANCE block (as I did with
brw_store_register_mem64), or call intel_batchbuffer_reserve_space()
explicitly at the top of your function.
> +static void
> +set_predicate_for_result(struct brw_context *brw,
> + struct brw_query_object *query,
> + bool inverted)
> +{
> + int load_op;
> +
> + assert(query->bo != NULL);
> +
> + load_64bit_register(brw, MI_PREDICATE_SRC0, query->bo, 0);
> + load_64bit_register(brw, MI_PREDICATE_SRC1, query->bo, 8);
> +
> + if (inverted)
> + load_op = MI_PREDICATE_LOADOP_LOAD;
> + else
> + load_op = MI_PREDICATE_LOADOP_LOADINV;
> +
> + BEGIN_BATCH(1);
> + OUT_BATCH(GEN7_MI_PREDICATE |
> + load_op |
> + MI_PREDICATE_COMBINEOP_SET |
> + MI_PREDICATE_COMPAREOP_SRCS_EQUAL);
> + ADVANCE_BATCH();
> +
> + brw->predicate.state = BRW_PREDICATE_STATE_USE_BIT;
> +}
> +
> +static void
> +brw_begin_conditional_render(struct gl_context *ctx,
> + struct gl_query_object *q,
> + GLenum mode)
> +{
> + struct brw_context *brw = brw_context(ctx);
> + struct brw_query_object *query = (struct brw_query_object *) q;
> + bool inverted;
> +
> + if (!brw->predicate.supported)
> + return;
> +
> + switch (mode) {
> + case GL_QUERY_WAIT:
> + case GL_QUERY_NO_WAIT:
> + case GL_QUERY_BY_REGION_WAIT:
> + case GL_QUERY_BY_REGION_NO_WAIT:
> + inverted = false;
> + break;
> + case GL_QUERY_WAIT_INVERTED:
> + case GL_QUERY_NO_WAIT_INVERTED:
> + case GL_QUERY_BY_REGION_WAIT_INVERTED:
> + case GL_QUERY_BY_REGION_NO_WAIT_INVERTED:
> + inverted = true;
> + break;
> + default:
> + unreachable("Unexpected conditional render mode");
> + }
> +
> + /* If there are already samples from a BLT operation or if the query object
> + * is ready then we can avoid looking at the values in the buffer and just
> + * decide whether to draw using the CPU without stalling */
Great catch! We get to completely skip uploading draws in this case.
> + if (query->Base.Result || query->Base.Ready)
> + set_predicate_enable(brw, (query->Base.Result != 0) ^ inverted);
> + else
> + set_predicate_for_result(brw, query, inverted);
> +}
> +
> +static void
> +brw_end_conditional_render(struct gl_context *ctx,
> + struct gl_query_object *q)
> +{
> + struct brw_context *brw = brw_context(ctx);
> +
> + /* When there is no longer a conditional render in progress it should
> + * always render */
*/ goes on its own line, here and elsewhere.
> + brw->predicate.state = BRW_PREDICATE_STATE_RENDER;
> +}
> +
> +void
> +brw_init_conditional_render_functions(struct dd_function_table *functions)
> +{
> + functions->BeginConditionalRender = brw_begin_conditional_render;
> + functions->EndConditionalRender = brw_end_conditional_render;
> +}
> +
> +bool
> +brw_check_conditional_render(struct brw_context *brw)
> +{
> + if (brw->predicate.supported) {
> + /* In some cases it is possible to determine that the primitives should
> + * be skipped without needing the predicate enable bit and still without
> + * stalling */
> + return brw->predicate.state != BRW_PREDICATE_STATE_DONT_RENDER;
> + } else {
> + if (brw->ctx.Query.CondRenderQuery) {
> + perf_debug("Conditional rendering is implemented in software and may "
> + "stall.\n");
> + }
> +
> + return _mesa_check_conditional_render(&brw->ctx);
I'd put this in the same block as the perf_debug and just do 'return
true' here - we can save the function call and redundant query check in
the common case (and this is a really hot path).
> + }
> +}
> diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c
> index fd7420a..673529a 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.c
> +++ b/src/mesa/drivers/dri/i965/brw_context.c
> @@ -289,6 +289,8 @@ brw_init_driver_functions(struct brw_context *brw,
> else
> gen4_init_queryobj_functions(functions);
> brw_init_compute_functions(functions);
> + if (brw->gen >= 7)
> + brw_init_conditional_render_functions(functions);
>
> functions->QuerySamplesForFormat = brw_query_samples_for_format;
>
> @@ -891,6 +893,8 @@ brwCreateContext(gl_api api,
> brw->gs.enabled = false;
> brw->sf.viewport_transform_enable = true;
>
> + brw->predicate.state = BRW_PREDICATE_STATE_RENDER;
> +
> ctx->VertexProgram._MaintainTnlProgram = true;
> ctx->FragmentProgram._MaintainTexEnvProgram = true;
>
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
> index 834aaa4..b9383fa 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -966,6 +966,18 @@ struct brw_stage_state
> uint32_t sampler_offset;
> };
>
> +enum brw_predicate_state {
> + /* The first two states are used if we can determine whether to draw
> + * without having to look at the values in the query object buffer. This
> + * will happen if there is no conditional render in progress, if the query
> + * object is already completed or if something else has already added
> + * samples to the preliminary result such as via a BLT command. */
> + BRW_PREDICATE_STATE_RENDER,
> + BRW_PREDICATE_STATE_DONT_RENDER,
> + /* In this case whether to draw or not depends on the result of an
> + * MI_PREDICATE command so the predicate enable bit needs to be checked */
> + BRW_PREDICATE_STATE_USE_BIT
> +};
>
> /**
> * brw_context is derived from gl_context.
> @@ -1402,6 +1414,11 @@ struct brw_context
> } query;
>
> struct {
> + enum brw_predicate_state state;
> + bool supported;
> + } predicate;
> +
> + struct {
> /** A map from pipeline statistics counter IDs to MMIO addresses. */
> const int *statistics_registers;
>
> @@ -1600,6 +1617,10 @@ void brw_write_depth_count(struct brw_context *brw, drm_intel_bo *bo, int idx);
> void brw_store_register_mem64(struct brw_context *brw,
> drm_intel_bo *bo, uint32_t reg, int idx);
>
> +/** brw_conditional_render.c */
> +void brw_init_conditional_render_functions(struct dd_function_table *functions);
> +bool brw_check_conditional_render(struct brw_context *brw);
> +
> /** intel_batchbuffer.c */
> void brw_load_register_mem(struct brw_context *brw,
> uint32_t reg,
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h
> index 83d7a35..11cb3fa 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -51,6 +51,7 @@
> # define GEN4_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL (0 << 15)
> # define GEN4_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM (1 << 15)
> # define GEN7_3DPRIM_INDIRECT_PARAMETER_ENABLE (1 << 10)
> +# define GEN7_3DPRIM_PREDICATE_ENABLE (1 << 8)
> /* DW1 */
> # define GEN7_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL (0 << 8)
> # define GEN7_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM (1 << 8)
> diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c
> index 96e2369..a7164db 100644
> --- a/src/mesa/drivers/dri/i965/brw_draw.c
> +++ b/src/mesa/drivers/dri/i965/brw_draw.c
> @@ -178,6 +178,7 @@ static void brw_emit_prim(struct brw_context *brw,
> int verts_per_instance;
> int vertex_access_type;
> int indirect_flag;
> + int predicate_enable;
>
> DBG("PRIM: %s %d %d\n", _mesa_lookup_enum_by_nr(prim->mode),
> prim->start, prim->count);
> @@ -258,10 +259,14 @@ static void brw_emit_prim(struct brw_context *brw,
> indirect_flag = 0;
> }
>
> -
> if (brw->gen >= 7) {
> + if (brw->predicate.state == BRW_PREDICATE_STATE_USE_BIT)
> + predicate_enable = GEN7_3DPRIM_PREDICATE_ENABLE;
> + else
> + predicate_enable = 0;
> +
> BEGIN_BATCH(7);
> - OUT_BATCH(CMD_3D_PRIM << 16 | (7 - 2) | indirect_flag);
> + OUT_BATCH(CMD_3D_PRIM << 16 | (7 - 2) | indirect_flag | predicate_enable);
> OUT_BATCH(hw_prim | vertex_access_type);
> } else {
> BEGIN_BATCH(6);
> @@ -561,12 +566,7 @@ void brw_draw_prims( struct gl_context *ctx,
>
> assert(unused_tfb_object == NULL);
>
> - if (ctx->Query.CondRenderQuery) {
> - perf_debug("Conditional rendering is implemented in software and may "
> - "stall. This should be fixed in the driver.\n");
> - }
> -
> - if (!_mesa_check_conditional_render(ctx))
> + if (!brw_check_conditional_render(brw))
> return;
>
> /* Handle primitive restart if needed */
> diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i965/brw_queryobj.c
> index 667c900..f0a50fe 100644
> --- a/src/mesa/drivers/dri/i965/brw_queryobj.c
> +++ b/src/mesa/drivers/dri/i965/brw_queryobj.c
> @@ -66,10 +66,19 @@ brw_write_timestamp(struct brw_context *brw, drm_intel_bo *query_bo, int idx)
> void
> brw_write_depth_count(struct brw_context *brw, drm_intel_bo *query_bo, int idx)
> {
> - brw_emit_pipe_control_write(brw,
> - PIPE_CONTROL_WRITE_DEPTH_COUNT
> - | PIPE_CONTROL_DEPTH_STALL,
> - query_bo, idx * sizeof(uint64_t), 0, 0);
> + uint32_t flags;
> +
> + flags = (PIPE_CONTROL_WRITE_DEPTH_COUNT |
> + PIPE_CONTROL_DEPTH_STALL);
> +
> + /* Needed to ensure the memory is coherent for the MI_LOAD_REGISTER_MEM
> + * command when loading the values into the predicate source registers for
> + * conditional rendering */
> + if (brw->predicate.supported)
> + flags |= PIPE_CONTROL_FLUSH_ENABLE;
> +
> + brw_emit_pipe_control_write(brw, flags, query_bo,
> + idx * sizeof(uint64_t), 0, 0);
> }
>
> /**
> diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c
> index c3eee31..cafb774 100644
> --- a/src/mesa/drivers/dri/i965/intel_extensions.c
> +++ b/src/mesa/drivers/dri/i965/intel_extensions.c
> @@ -320,6 +320,8 @@ intelInitExtensions(struct gl_context *ctx)
> }
> }
>
> + brw->predicate.supported = false;
> +
> if (brw->gen >= 7) {
> ctx->Extensions.ARB_conservative_depth = true;
> ctx->Extensions.ARB_derivative_control = true;
> @@ -333,6 +335,9 @@ intelInitExtensions(struct gl_context *ctx)
> ctx->Extensions.ARB_transform_feedback2 = true;
> ctx->Extensions.ARB_transform_feedback3 = true;
> ctx->Extensions.ARB_transform_feedback_instanced = true;
> +
> + if (brw->intelScreen->cmd_parser_version >= 2)
> + brw->predicate.supported = true;
So, this is insufficient for Haswell. There was not a version bump when
it actually started working (I think Daniel assumed we didn't need it,
since we were already attempting to write registers ourselves.)
I think the best plan of action is to submit a kernel patch bumping the
command parser version number to 4, then change this to:
const int cmd_parser_version = brw->intelScreen->cmd_parser_version;
if (cmd_parser_version >= (brw->is_haswell ? 4 : 2))
brw->predicate.supported = true;
Would you mind sending such a kernel patch?
We should also ask them to try and backport the version bump to any
kernel containing this commit:
commit 245054a1fe33c06ad233e0d58a27ec7b64db9284
Author: Daniel Vetter <daniel.vetter at ffwll.ch>
Date: Tue Apr 14 17:35:22 2015 +0200
drm/i915: Enable cmd parser to do secure batch promotion for aliasing ppgtt
Alternatively, we could try and write the registers, but that seems
pretty lame. The version handshake is a lot nicer.
With Haswell sorted out, this is:
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
Excellent work as always!
> }
>
> /* Only enable this in core profile because other parts of Mesa behave
> diff --git a/src/mesa/drivers/dri/i965/intel_reg.h b/src/mesa/drivers/dri/i965/intel_reg.h
> index 488fb5b..bd14e18 100644
> --- a/src/mesa/drivers/dri/i965/intel_reg.h
> +++ b/src/mesa/drivers/dri/i965/intel_reg.h
> @@ -48,6 +48,20 @@
> #define GEN7_MI_LOAD_REGISTER_MEM (CMD_MI | (0x29 << 23))
> # define MI_LOAD_REGISTER_MEM_USE_GGTT (1 << 22)
>
> +/* Manipulate the predicate bit based on some register values. Only on Gen7+ */
> +#define GEN7_MI_PREDICATE (CMD_MI | (0xC << 23))
> +# define MI_PREDICATE_LOADOP_KEEP (0 << 6)
> +# define MI_PREDICATE_LOADOP_LOAD (2 << 6)
> +# define MI_PREDICATE_LOADOP_LOADINV (3 << 6)
> +# define MI_PREDICATE_COMBINEOP_SET (0 << 3)
> +# define MI_PREDICATE_COMBINEOP_AND (1 << 3)
> +# define MI_PREDICATE_COMBINEOP_OR (2 << 3)
> +# define MI_PREDICATE_COMBINEOP_XOR (3 << 3)
> +# define MI_PREDICATE_COMPAREOP_TRUE (0 << 0)
> +# define MI_PREDICATE_COMPAREOP_FALSE (1 << 0)
> +# define MI_PREDICATE_COMPAREOP_SRCS_EQUAL (2 << 0)
> +# define MI_PREDICATE_COMPAREOP_DELTAS_EQUAL (3 << 0)
> +
> /** @{
> *
> * PIPE_CONTROL operation, a combination MI_FLUSH and register write with
> @@ -69,6 +83,7 @@
> #define PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE (1 << 10) /* GM45+ only */
> #define PIPE_CONTROL_ISP_DIS (1 << 9)
> #define PIPE_CONTROL_INTERRUPT_ENABLE (1 << 8)
> +#define PIPE_CONTROL_FLUSH_ENABLE (1 << 7) /* Gen7+ only */
> /* GT */
> #define PIPE_CONTROL_DATA_CACHE_INVALIDATE (1 << 5)
> #define PIPE_CONTROL_VF_CACHE_INVALIDATE (1 << 4)
> @@ -147,3 +162,11 @@
> # define GEN9_PARTIAL_RESOLVE_DISABLE_IN_VC (1 << 1)
> # define GEN8_HIZ_PMA_MASK_BITS \
> ((GEN8_HIZ_NP_PMA_FIX_ENABLE | GEN8_HIZ_NP_EARLY_Z_FAILS_DISABLE) << 16)
> +
> +/* Predicate registers */
> +#define MI_PREDICATE_SRC0 0x2400
> +#define MI_PREDICATE_SRC1 0x2408
> +#define MI_PREDICATE_DATA 0x2410
> +#define MI_PREDICATE_RESULT 0x2418
> +#define MI_PREDICATE_RESULT_1 0x241C
> +#define MI_PREDICATE_RESULT_2 0x2214
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150508/74fe9d9e/attachment-0001.sig>
More information about the mesa-dev
mailing list