<div dir="ltr">Hi Ken,<div>Thank you so much for reviewing and helping me out with this! I'll make the changes you suggested</div><div>to both patches and resubmit.</div><div><br></div><div>Thanks,</div><div>Pam</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Sep 26, 2017 at 3:32 AM, Kenneth Graunke <span dir="ltr"><<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Tuesday, August 29, 2017 9:24:01 AM PDT Plamena Manolova wrote:<br>
> We can implement ARB_indirect_parameters for i965 by<br>
> taking advantage of the conditional rendering mechanism.<br>
> This works by issuing maxdrawcount draw calls and using<br>
> conditional rendering to predicate each of them with<br>
> "drawcount > gl_DrawID"<br>
><br>
> Signed-off-by: Plamena Manolova <<a href="mailto:plamena.manolova@intel.com">plamena.manolova@intel.com</a>><br>
> ---<br>
>  src/mesa/drivers/dri/i965/brw_<wbr>context.h      |  3 +<br>
>  src/mesa/drivers/dri/i965/brw_<wbr>draw.c         | 97 ++++++++++++++++++++++++++++<br>
>  src/mesa/drivers/dri/i965/brw_<wbr>draw.h         | 11 ++++<br>
>  src/mesa/drivers/dri/i965/<wbr>intel_extensions.c |  2 +<br>
>  4 files changed, 113 insertions(+)<br>
><br>
> diff --git a/src/mesa/drivers/dri/i965/<wbr>brw_context.h b/src/mesa/drivers/dri/i965/<wbr>brw_context.h<br>
> index 2274fe5..4639d5b 100644<br>
> --- a/src/mesa/drivers/dri/i965/<wbr>brw_context.h<br>
> +++ b/src/mesa/drivers/dri/i965/<wbr>brw_context.h<br>
> @@ -831,6 +831,9 @@ struct brw_context<br>
>        int gl_drawid;<br>
>        struct brw_bo *draw_id_bo;<br>
>        uint32_t draw_id_offset;<br>
> +<br>
> +      struct brw_bo *draw_params_count_bo;<br>
> +      uint32_t draw_params_count_offset;<br>
>     } draw;<br>
><br>
>     struct {<br>
> diff --git a/src/mesa/drivers/dri/i965/<wbr>brw_draw.c b/src/mesa/drivers/dri/i965/<wbr>brw_draw.c<br>
> index 7597bae..473958c 100644<br>
> --- a/src/mesa/drivers/dri/i965/<wbr>brw_draw.c<br>
> +++ b/src/mesa/drivers/dri/i965/<wbr>brw_draw.c<br>
> @@ -820,6 +820,11 @@ brw_end_draw_prims(struct gl_context *ctx,<br>
>     brw_program_cache_check_size(<wbr>brw);<br>
>     brw_postdraw_reconcile_align_<wbr>wa_slices(brw);<br>
>     brw_postdraw_set_buffers_need_<wbr>resolve(brw);<br>
> +<br>
> +   if (brw->draw.draw_params_count_<wbr>bo) {<br>
> +      brw_bo_unreference(brw->draw.<wbr>draw_params_count_bo);<br>
> +      brw->draw.draw_params_count_bo = NULL;<br>
> +   }<br>
>  }<br>
><br>
>  void<br>
> @@ -837,6 +842,8 @@ brw_draw_prims(struct gl_context *ctx,<br>
>     struct brw_context *brw = brw_context(ctx);<br>
>     const struct gl_vertex_array **arrays = ctx->Array._DrawArrays;<br>
>     int i;<br>
> +   int predicate_state = brw->predicate.state;<br>
> +   int combine_op = MI_PREDICATE_COMBINEOP_SET;<br>
>     struct brw_transform_feedback_object *xfb_obj =<br>
>        (struct brw_transform_feedback_object *) gl_xfb_obj;<br>
><br>
> @@ -890,12 +897,101 @@ brw_draw_prims(struct gl_context *ctx,<br>
>      * to it.<br>
>      */<br>
><br>
> +   if (brw->draw.draw_params_count_<wbr>bo &&<br>
> +       predicate_state == BRW_PREDICATE_STATE_USE_BIT) {<br>
> +      /* We do this to empty the MI_PREDICATE_DATA register */<br>
> +      BEGIN_BATCH(4);<br>
> +      OUT_BATCH(MI_PREDICATE_DATA);<br>
> +      OUT_BATCH(0u);<br>
> +      OUT_BATCH(MI_PREDICATE_DATA + 4);<br>
> +      OUT_BATCH(0u);<br>
> +      ADVANCE_BATCH();<br>
> +<br>
> +      combine_op = MI_PREDICATE_COMBINEOP_AND;<br>
> +   }<br>
> +<br>
>     for (i = 0; i < nr_prims; i++) {<br>
> +      if (brw->draw.draw_params_count_<wbr>bo) {<br>
> +         struct brw_bo *draw_id_bo = brw_bo_alloc(brw->bufmgr, "draw_id", 4, 4);<br>
> +<br>
> +         brw_bo_reference(draw_id_bo);<br>
<br>
</div></div>brw_bo_alloc starts you with a reference count of 1, so you don't want<br>
to call brw_bo_reference here - that would increase the refcount to 2,<br>
and your brw_bo_unreference() call below would only drop it to 1, which<br>
would leak the BO.<br>
<span class=""><br>
> +         brw_bo_subdata(draw_id_bo, 0, 4, &prims[i].draw_id);<br>
<br>
</span>Buffer objects are always page-aligned sizes, so we're actually<br>
allocating a full 4 kilobytes for this 32-bit integer.  That's kind<br>
of a bummer.  To avoid this, we have a streaming upload buffer which we<br>
use to append random data we want to use in the batch.  I'd use that<br>
instead:<br>
<span class=""><br>
   struct brw_bo *draw_id_bo;<br>
   uint32_t draw_id_offset;<br>
<br>
</span>   intel_upload_data(brw, &prims[i].draw_id, 4, 4,<br>
                     &draw_id_bo, &draw_id_offset);<br>
<br>
(I'm assuming prims[i].draw_id is 4-byte aligned...it should be unless<br>
people have explicitly marked the _mesa_prim struct PACKED...)<br>
<span class=""><br>
> +<br>
> +         brw_emit_pipe_control_flush(<wbr>brw, PIPE_CONTROL_FLUSH_ENABLE);<br>
> +<br>
> +         brw_load_register_mem(brw, MI_PREDICATE_SRC0,<br>
> +                               brw->draw.draw_params_count_<wbr>bo,<br>
> +                               brw->draw.draw_params_count_<wbr>offset);<br>
> +         brw_load_register_mem(brw, MI_PREDICATE_SRC1, draw_id_bo, 0);<br>
<br>
</span>then make sure to use the offset:<br>
<br>
         brw_load_register_mem(brw, MI_PREDICATE_SRC1,<br>
                               draw_id_bo, draw_id_offset);<br>
<br>
Note that intel_upload_* is only meant to be used for a single batch...<br>
if you need it to persist across multiple batches, you probably want to<br>
allocate your own BO like you originally did here.<br>
<div><div class="h5"><br>
> +<br>
> +         BEGIN_BATCH(1);<br>
> +         OUT_BATCH(GEN7_MI_PREDICATE |<br>
> +                   MI_PREDICATE_LOADOP_LOADINV | combine_op |<br>
> +                   MI_PREDICATE_COMPAREOP_DELTAS_<wbr>EQUAL);<br>
> +         ADVANCE_BATCH();<br>
> +<br>
> +         brw->predicate.state = BRW_PREDICATE_STATE_USE_BIT;<br>
> +<br>
> +         brw_bo_unreference(draw_id_bo)<wbr>;<br>
> +      }<br>
> +<br>
>        brw_try_draw_prim(ctx, arrays, &prims[i], ib, index_bounds_valid,<br>
>                          min_index, max_index, xfb_obj, stream, indirect);<br>
>     }<br>
> +<br>
>     brw_end_draw_prims(ctx, arrays, prims, nr_prims, ib, index_bounds_valid,<br>
>                        min_index, max_index, xfb_obj, stream, indirect);<br>
> +<br>
> +   brw->predicate.state = predicate_state;<br>
> +}<br>
> +<br>
> +void<br>
> +brw_draw_indirect_prims(<wbr>struct gl_context *ctx,<br>
> +                        GLuint mode,<br>
> +                        struct gl_buffer_object *indirect_data,<br>
> +                        GLsizeiptr indirect_offset,<br>
> +                        unsigned draw_count,<br>
> +                        unsigned stride,<br>
> +                        struct gl_buffer_object *indirect_params,<br>
> +                        GLsizeiptr indirect_params_offset,<br>
> +                        const struct _mesa_index_buffer *ib)<br>
> +{<br>
> +   struct brw_context *brw = brw_context(ctx);<br>
> +   struct _mesa_prim *prim;<br>
> +   GLsizei i;<br>
> +<br>
> +   prim = calloc(draw_count, sizeof(*prim));<br>
> +   if (prim == NULL) {<br>
> +      _mesa_error(ctx, GL_OUT_OF_MEMORY, "gl%sDraw%sIndirect%s",<br>
> +                  (draw_count > 1) ? "Multi" : "",<br>
> +                  ib ? "Elements" : "Arrays",<br>
> +                  indirect_params ? "CountARB" : "");<br>
> +      return;<br>
> +   }<br>
> +<br>
> +   prim[0].begin = 1;<br>
> +   prim[draw_count - 1].end = 1;<br>
> +   for (i = 0; i < draw_count; ++i, indirect_offset += stride) {<br>
> +      prim[i].mode = mode;<br>
> +      prim[i].indexed = !!ib;<br>
> +      prim[i].indirect_offset = indirect_offset;<br>
> +      prim[i].is_indirect = 1;<br>
> +      prim[i].draw_id = i;<br>
> +   }<br>
> +<br>
> +   if (indirect_params) {<br>
> +      brw->draw.draw_params_count_bo =<br>
> +         intel_buffer_object(indirect_<wbr>params)->buffer;<br>
> +      brw_bo_reference(brw->draw.<wbr>draw_params_count_bo);<br>
> +      brw->draw.draw_params_count_<wbr>offset = indirect_params_offset;<br>
> +   }<br>
> +<br>
> +   brw_draw_prims(ctx, prim, draw_count,<br>
> +                  ib, false, 0, ~0,<br>
> +                  NULL, 0,<br>
> +                  indirect_data);<br>
<br>
</div></div>One thing I considered here was that brw_draw_prims may flush the batch,<br>
causing this draw to span across multiple batches.  I think that's okay<br>
here, as you're using the high level function, which takes care of<br>
requiring space, flushing, and reflagging state as necessary.  If you<br>
were calling brw_try_draw_prim directly, I'd be more concerned about it.<br>
<br>
So, I think there's nothing to do here.<br>
<div><div class="h5"><br>
> +<br>
> +   free(prim);<br>
>  }<br>
><br>
>  void<br>
> @@ -907,6 +1003,7 @@ brw_draw_init(struct brw_context *brw)<br>
>     /* Register our drawing function:<br>
>      */<br>
>     vbo->draw_prims = brw_draw_prims;<br>
> +   vbo->draw_indirect_prims = brw_draw_indirect_prims;<br>
><br>
>     for (int i = 0; i < VERT_ATTRIB_MAX; i++)<br>
>        brw->vb.inputs[i].buffer = -1;<br>
> diff --git a/src/mesa/drivers/dri/i965/<wbr>brw_draw.h b/src/mesa/drivers/dri/i965/<wbr>brw_draw.h<br>
> index 3b99915..07aab1c 100644<br>
> --- a/src/mesa/drivers/dri/i965/<wbr>brw_draw.h<br>
> +++ b/src/mesa/drivers/dri/i965/<wbr>brw_draw.h<br>
> @@ -58,6 +58,17 @@ void brw_draw_prims(struct gl_context *ctx,<br>
>  void brw_draw_init( struct brw_context *brw );<br>
>  void brw_draw_destroy( struct brw_context *brw );<br>
><br>
> +void<br>
> +brw_draw_indirect_prims(<wbr>struct gl_context *ctx,<br>
> +                        GLuint mode,<br>
> +                        struct gl_buffer_object *indirect_data,<br>
> +                        GLsizeiptr indirect_offset,<br>
> +                        unsigned draw_count,<br>
> +                        unsigned stride,<br>
> +                        struct gl_buffer_object *indirect_params,<br>
> +                        GLsizeiptr indirect_params_offset,<br>
> +                        const struct _mesa_index_buffer *ib);<br>
> +<br>
>  void brw_prepare_shader_draw_<wbr>parameters(struct brw_context *);<br>
><br>
>  /* brw_primitive_restart.c */<br>
> diff --git a/src/mesa/drivers/dri/i965/<wbr>intel_extensions.c b/src/mesa/drivers/dri/i965/<wbr>intel_extensions.c<br>
> index deacd0d..c116d43 100644<br>
> --- a/src/mesa/drivers/dri/i965/<wbr>intel_extensions.c<br>
> +++ b/src/mesa/drivers/dri/i965/<wbr>intel_extensions.c<br>
> @@ -230,6 +230,8 @@ intelInitExtensions(struct gl_context *ctx)<br>
><br>
>        if (can_do_pipelined_register_<wbr>writes(brw->screen)) {<br>
>           ctx->Extensions.ARB_draw_<wbr>indirect = true;<br>
> +         if (ctx->Extensions.ARB_<wbr>conditional_render_inverted)<br>
> +            ctx->Extensions.ARB_indirect_<wbr>parameters = true;<br>
<br>
</div></div>This isn't quite right - we enable ARB_conditional_render_<wbr>inverted<br>
unconditionally on Gen6+, but it might be implemented via fallbacks.<br>
<br>
What you really want is access to the hardware predicate support.<br>
Add it to the block below:<br>
<br>
         if (can_do_predicate_writes(brw-><wbr>screen)) {<br>
            brw->predicate.supported = true;<br>
<span class="">            ctx->Extensions.ARB_indirect_<wbr>parameters = true;<br>
         }<br>
<br>
>           ctx->Extensions.ARB_transform_<wbr>feedback2 = true;<br>
>           ctx->Extensions.ARB_transform_<wbr>feedback3 = true;<br>
>           ctx->Extensions.ARB_transform_<wbr>feedback_instanced = true;<br>
><br>
<br>
<br>
</span>With that fixed, this looks good to me!  Great work :)<br>
<br>
I'll give official Reviewed-bys when I see a v2.<br>
</blockquote></div><br></div>