[Mesa-dev] [PATCH] i965: Implement GL_PRIMITIVES_GENERATED with non-zero streams.

Kenneth Graunke kenneth at whitecape.org
Wed Sep 10 11:29:03 PDT 2014


On Wednesday, September 10, 2014 10:16:13 AM Iago Toral Quiroga wrote:
> So far we have been using CL_INVOCATION_COUNT to resolve this query but this
> is no good with streams, as only stream 0 reaches the clipping stage.
> 
> From ARB_transform_feedback3:
> 
> "When a generated primitive query for a vertex stream is active, the
>  primitives-generated count is incremented every time a primitive emitted to
>  that stream reaches the Discarding Rasterization stage (see Section 3.x)
>  right before rasterization. This counter is incremented whether or not
>  transform feedback is active."
> 
> Unfortunately, we don't have any registers that provide the number of primitives
> written to a specific stream other than the ones that track the number of
> primitives written to transform feedback in the SOL stage, so we can't
> implement this exactly as specified.
> 
> In the past we implemented this feature by activating the SOL unit even if
> transform feeback was disabled, but making it so that all buffers were
> disabled and it only recorded statistics, which gave us the right semantics
> (see 3178d2474ae5bdd1102fb3d76a60d1d63c961ff5). Unfortunately, this came with
> a significant performance impact and had to be reverted.
> 
> This new take does not intend to implement the exact semantics required by
> the spec, but improves what we have now, since now we return the primitive
> count for stream 0 in all cases. With this patch we use
> GEN7_SO_PRIM_STORAGE_NEEDED to resolve GL_PRIMITIVES_GENERATED queries
> for non-zero streams. This would return the number of primitives written
> to transform feedback for each stream instead. Since non-zero streams are
> only useful in combination with transform feedback this should not be too
> bad, and the only case that I think we would not be supporting would be
> the one in which we want to use both GL_PRIMITIVES_GENERATED and
> GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN on the same non-zero stream to
> detect buffer overflow.
> 
> This patch also fixes the following piglit test:
> arb_gpu_shader5-xfb-streams-without-invocations
> 
> This test uses both GL_PRIMITIVES_GENERATED and
> GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries on non-zero streams, but it
> does never hit the overflow case, so both queries are always expected to return
> the same value.
> ---
>  src/mesa/drivers/dri/i965/gen6_queryobj.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> This was my original patch in the multi-stream series to deal with this before
> Chris suggested the trick of enabling the SOL unit even if TF was not active.
> Since we had to revert the patch that implemented Chris's suggestion, maybe this
> makes sense while we don't figure out a better solution.
> 
> diff --git a/src/mesa/drivers/dri/i965/gen6_queryobj.c b/src/mesa/drivers/dri/i965/gen6_queryobj.c
> index 3aec99c..130236e 100644
> --- a/src/mesa/drivers/dri/i965/gen6_queryobj.c
> +++ b/src/mesa/drivers/dri/i965/gen6_queryobj.c
> @@ -84,11 +84,16 @@ brw_store_register_mem64(struct brw_context *brw,
>  
>  static void
>  write_primitives_generated(struct brw_context *brw,
> -                           drm_intel_bo *query_bo, int idx)
> +                           drm_intel_bo *query_bo, int stream, int idx)
>  {
>     intel_batchbuffer_emit_mi_flush(brw);
>  
> -   brw_store_register_mem64(brw, query_bo, CL_INVOCATION_COUNT, idx);
> +   if (brw->gen >= 7 && stream > 0) {
> +      brw_store_register_mem64(brw, query_bo,
> +                               GEN7_SO_PRIM_STORAGE_NEEDED(stream), idx);
> +   } else {
> +      brw_store_register_mem64(brw, query_bo, CL_INVOCATION_COUNT, idx);
> +   }
>  }
>  
>  static void
> @@ -239,7 +244,7 @@ gen6_begin_query(struct gl_context *ctx, struct gl_query_object *q)
>        break;
>  
>     case GL_PRIMITIVES_GENERATED:
> -      write_primitives_generated(brw, query->bo, 0);
> +      write_primitives_generated(brw, query->bo, query->Base.Stream, 0);
>        break;
>  
>     case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN:
> @@ -277,7 +282,7 @@ gen6_end_query(struct gl_context *ctx, struct gl_query_object *q)
>        break;
>  
>     case GL_PRIMITIVES_GENERATED:
> -      write_primitives_generated(brw, query->bo, 1);
> +      write_primitives_generated(brw, query->bo, query->Base.Stream, 1);
>        break;
>  
>     case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN:
> 

I think this is a good idea.  The only cases I can think of where this deviates from the spec are:

1. Outputting primitives to streams 1-3 when transform feedback is disabled (SOL off, so no statistics).
2. Outputting primitives to streams 1-3 when recording 0 varyings (nothing being written, so no statistics).

I don't expect anything except a test case would hit those cases.  Streams 1-3 are only for use with transform feedback - you can't render with them - so if you've disabled transform feedback or elected not to record any varyings, any primitives you generate just go nowhere.  So we're only failing to count useless primitives.  And I'm okay with that.

Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
Cc: "10.3" <mesa-stable at lists.freedesktop.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140910/42e74440/attachment.sig>


More information about the mesa-dev mailing list