[Mesa-dev] [PATCH] [v2] i965: implement ARB_pipeline_statistics_query

Mon Dec 8 18:50:48 PST 2014

On Tue, Dec 02, 2014 at 11:07:34PM -0800, Ian Romanick wrote:
> Since there will be a v3 anyway, nits below...
> 
> On 12/02/2014 06:33 PM, Ben Widawsky wrote:
> > This patch implements ARB_pipeline_statistics_query. This addition to GL does
> > not add a new API. Instead, it adds new tokens to the existing query APIs. The
> > work to hook up the new tokens is trivial due to it's similarity to the previous
> > work done for the query APIs. I've implemented all the new tokens to some
> > degree, but have stubbed out the untested ones at the entry point for Begin().
> > Doing this should allow the remainder of the code to be left in.
> > 
> > The new tokens give GL clients a way to obtain stats about the GL pipeline.
> > Generally, you get the number of things going in, invocations, and number of
> > things coming out, primitives, of the various stages. There are two immediate
> > uses for this, performance information, and debugging various types of
> > misrendering. I doubt one can use these for debugging very complex applications,
> > but for piglit tests, it should be quite useful.
> > 
> > Tessellation shaders, and compute shaders are not addressed in this patch
> > because there is no upstream implementation. I've implemented how I believe
> > tessellation shader stats will work for Intel hardware (though there is a bit of
> > ambiguity). Compute shaders are a bit more interesting though, and I don't yet
> > know what we'll do there.
> 
> 
> > 
> > For the lazy, here is a link to the relevant part of the spec:
> > https://www.opengl.org/registry/specs/ARB/pipeline_statistics_query.txt
> > 
> > Running the piglit tests
> > http://lists.freedesktop.org/archives/piglit/2014-November/013321.html
> > (http://cgit.freedesktop.org/~bwidawsk/piglit/log/?h=pipe_stats)
> > yield the following results:
> > 
> >> python2 ./piglit-run.py -t stats tests/all.py output/pipeline_stats
> >> [5/5] pass: 5 Running Test(s): 5
> > 
> > Previously I was seeing the adjacent vertex test failing on certain Intel
> > hardware. I am currently not able to reproduce this, and therefore for now, I'll
> > assume it was some transient issue which has been fixed.
> > 
> > v2:
> > - Don't allow pipeline_stats to be per stream (Ilia). This would be needed for
> >   AMD_transform_feedback4, which we do not support.
> >    > If AMD_transform_feedback4 is supported then GEOMETRY_SHADER_PRIMITIVES_-
> >    > EMITTED_ARB counts primitives emitted to any of the vertex streams for which
> >    > STREAM_RASTERIZATION_AMD is enabled.
> > - Remove comment from GL3.txt because it is only used for extensions that are
> >   part of required versions (Ilia)
> > - Move the new tokens to a new XML doc instead of using the main GL4x.xml (Ilia)
> > - Add a fallthrough comment (Ilia)
> > - Only divide PS invocations by 4 on HSW+ (Ben)
> > 
> > Cc: Ilia Mirkin <imirkin at alum.mit.edu>
> > Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> > ---
> >  .../glapi/gen/ARB_pipeline_statistics_query.xml    |  24 ++++
> >  src/mesa/drivers/dri/i965/gen6_queryobj.c          | 121 +++++++++++++++++++++
> >  src/mesa/drivers/dri/i965/intel_extensions.c       |   1 +
> >  src/mesa/main/config.h                             |   3 +
> >  src/mesa/main/extensions.c                         |   1 +
> >  src/mesa/main/mtypes.h                             |  15 +++
> >  src/mesa/main/queryobj.c                           |  77 +++++++++++++
> >  7 files changed, 242 insertions(+)
> >  create mode 100644 src/mapi/glapi/gen/ARB_pipeline_statistics_query.xml
> > 
> > diff --git a/src/mapi/glapi/gen/ARB_pipeline_statistics_query.xml b/src/mapi/glapi/gen/ARB_pipeline_statistics_query.xml
> > new file mode 100644
> > index 0000000..db37267
> > --- /dev/null
> > +++ b/src/mapi/glapi/gen/ARB_pipeline_statistics_query.xml
> > @@ -0,0 +1,24 @@
> > +<?xml version="1.0"?>
> > +<!DOCTYPE OpenGLAPI SYSTEM "gl_API.dtd">
> > +
> > +<!-- Note: no GLX protocol info yet. -->
> > +
> > +<OpenGLAPI>
> > +
> > +<category name="GL_ARB_pipeline_statistics_query" number="100">
> > +
> > +  <enum name="VERTICES_SUBMITTED"                      value="0x82EE"/>
> > +  <enum name="PRIMITIVES_SUBMITTED"                    value="0x82EF"/>
> > +  <enum name="VERTEX_SHADER_INVOCATIONS"               value="0x82F0"/>
> > +  <enum name="TESS_CONTROL_SHADER_PATCHES"             value="0x82F1"/>
> > +  <enum name="TESS_EVALUATION_SHADER_INVOCATIONS"      value="0x82F2"/>
> > +  <!-- <enum name="GEOMETRY_SHADER_INVOCATIONS"             value="0x887F"/> -->
> > +  <enum name="GEOMETRY_SHADER_PRIMITIVES_EMITTED"      value="0x82F3"/>
> > +  <enum name="FRAGMENT_SHADER_INVOCATIONS"             value="0x82F4"/>
> > +  <enum name="COMPUTE_SHADER_INVOCATIONS"              value="0x82F5"/>
> > +  <enum name="CLIPPING_INPUT_PRIMITIVES"               value="0x82F6"/>
> > +  <enum name="CLIPPING_OUTPUT_PRIMITIVES"              value="0x82F7"/>
> > +
> > +</category>
> > +
> > +</OpenGLAPI>
> > diff --git a/src/mesa/drivers/dri/i965/gen6_queryobj.c b/src/mesa/drivers/dri/i965/gen6_queryobj.c
> > index 130236e..f8b9bc3 100644
> > --- a/src/mesa/drivers/dri/i965/gen6_queryobj.c
> > +++ b/src/mesa/drivers/dri/i965/gen6_queryobj.c
> > @@ -109,6 +109,73 @@ write_xfb_primitives_written(struct brw_context *brw,
> >     }
> >  }
> >  
> > +static inline const int
> > +pipeline_target_to_index(int target)
> > +{
> > +   if (target == GL_GEOMETRY_SHADER_INVOCATIONS)
> > +      return MAX_PIPELINE_STATISTICS - 1;
> > +   else
> > +      return target - GL_VERTICES_SUBMITTED_ARB;
> > +}
> > +
> > +static void
> > +emit_pipeline_stat(struct brw_context *brw, drm_intel_bo *bo,
> > +                   int stream, int target, int idx)
> > +{
> > +   /*
> > +    * There are 2 confusing parts to implementing the various target. The first is
> > +    * the distinction between vertices submitted and primitives submitted. The
> > +    * spec tries to clear this up, and luckily our hardware seems to understand
> > +    * it:
> > +    *
> > +    *  (8) What stage the VERTICES_SUBMITTED_ARB and PRIMITIVES_SUBMITTED_ARB
> > +    *  belong to? What do they count?
> > +    *
> > +    *    DISCUSSION: There is no separate pipeline stage introduced in the
> > +    *    specification that matches D3D's "input assembler" stage. While the
> > +    *    latest version of the GL specification mentions a "vertex puller" stage
> > +    *    in the pipeline diagram, this stage does not have a corresponding chapter
> > +    *    in the specification that introduces it.
> > +    *
> > +    *    RESOLVED: Introduce VERTICES_SUBMITTED_ARB and PRIMITIVES_SUBMITTED_ARB
> > +    *    in chapter 10, Vertex Specification and Drawing Command. They count the
> > +    *    total number of vertices and primitives processed by the GL. Including
> > +    *    multiple instances.
> > +    *
> > +    * The second confusion is the tessellation shader statistics. Our hardware has
> > +    * no statistics specific to the TE unit. Ideally we could have the HS
> > +    * primitives for TESS_CONTROL_SHADER_PATCHES_ARB, and the DS invocations as
> > +    * the register for TESS_CONTROL_SHADER_PATCHES_ARB. Unfortunately we don't
> > +    * have HS primitives, we only have HS invocations.
> > +    */
> > +
> > +   /* Everything except GEOMETRY_SHADER_INVOCATIONS can be kept in a simple
> > +    * lookup table */
> 
>  */ should be on it's own line (here and elsewhere).
> 
> > +   const uint32_t target_to_register[] = {
> 
> static.  Otherwise GCC emits code to initialize the array.
> 

sigh. I really wanted to not believe this is true.

> > +      IA_VERTICES_COUNT,   /* VERTICES_SUBMITTED */
> > +      IA_PRIMITIVES_COUNT, /* PRIMITIVES_SUBMITTED */
> > +      VS_INVOCATION_COUNT, /* VERTEX_SHADER_INVOCATIONS */
> > +      0, /* HS_INVOCATION_COUNT,*/  /* TESS_CONTROL_SHADER_PATCHES */
> > +      0, /* DS_INVOCATION_COUNT,*/  /* TESS_EVALUATION_SHADER_INVOCATIONS */
> > +      GS_PRIMITIVES_COUNT, /* GEOMETRY_SHADER_PRIMITIVES_EMITTED */
> > +      PS_INVOCATION_COUNT, /* FRAGMENT_SHADER_INVOCATIONS */
> > +      0,                   /* COMPUTE_SHADER_INVOCATIONS */
> > +      CL_INVOCATION_COUNT, /* CLIPPING_INPUT_PRIMITIVES */
> > +      CL_PRIMITIVES_COUNT, /* CLIPPING_OUTPUT_PRIMITIVES */
> > +      GS_INVOCATION_COUNT /* This one is special... */
> > +   };
> > +   STATIC_ASSERT(ARRAY_SIZE(target_to_register) == MAX_PIPELINE_STATISTICS);
> > +   uint32_t reg = target_to_register[pipeline_target_to_index(target)];
> > +   assert(reg != 0);
> > +
> > +   /* Emit a flush to make sure various parts of the pipeline are complete and
> > +    * we get an accurate value */
> > +   intel_batchbuffer_emit_mi_flush(brw);
> > +
> > +   brw_store_register_mem64(brw, bo, reg, idx);
> > +}
> > +
> > +
> >  /**
> >   * Wait on the query object's BO and calculate the final result.
> >   */
> > @@ -182,9 +249,29 @@ gen6_queryobj_get_results(struct gl_context *ctx,
> >  
> >     case GL_PRIMITIVES_GENERATED:
> >     case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN:
> > +   case GL_VERTICES_SUBMITTED_ARB:
> > +   case GL_PRIMITIVES_SUBMITTED_ARB:
> > +   case GL_VERTEX_SHADER_INVOCATIONS_ARB:
> > +   case GL_GEOMETRY_SHADER_INVOCATIONS:
> > +   case GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB:
> > +   case GL_CLIPPING_INPUT_PRIMITIVES_ARB:
> > +   case GL_CLIPPING_OUTPUT_PRIMITIVES_ARB:
> >        query->Base.Result = results[1] - results[0];
> >        break;
> >  
> > +   case GL_FRAGMENT_SHADER_INVOCATIONS_ARB:
> > +      query->Base.Result = (results[1] - results[0]);
> > +      /* HW reports this count 4X the actual value and therefore SW must divide
> > +       * the count by 4 for correct reporting.
> > +       * XXX: Not in public docs?
> > +       */
> > +      if (brw->gen >= 8 || brw->is_haswell)
> > +         query->Base.Result /= 4;
> > +      break;
> > +
> > +   case GL_TESS_CONTROL_SHADER_PATCHES_ARB:
> > +   case GL_TESS_EVALUATION_SHADER_INVOCATIONS_ARB:
> > +   case GL_COMPUTE_SHADER_INVOCATIONS_ARB:
> >     default:
> >        unreachable("Unrecognized query target in brw_queryobj_get_results()");
> >     }
> > @@ -251,6 +338,20 @@ gen6_begin_query(struct gl_context *ctx, struct gl_query_object *q)
> >        write_xfb_primitives_written(brw, query->bo, query->Base.Stream, 0);
> >        break;
> >  
> > +   case GL_VERTICES_SUBMITTED_ARB:
> > +   case GL_PRIMITIVES_SUBMITTED_ARB:
> > +   case GL_VERTEX_SHADER_INVOCATIONS_ARB:
> > +   case GL_GEOMETRY_SHADER_INVOCATIONS:
> > +   case GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB:
> > +   case GL_FRAGMENT_SHADER_INVOCATIONS_ARB:
> > +   case GL_CLIPPING_INPUT_PRIMITIVES_ARB:
> > +   case GL_CLIPPING_OUTPUT_PRIMITIVES_ARB:
> > +      emit_pipeline_stat(brw, query->bo, query->Base.Stream, query->Base.Target, 0);
> > +      break;
> > +
> > +   case GL_TESS_CONTROL_SHADER_PATCHES_ARB:
> > +   case GL_TESS_EVALUATION_SHADER_INVOCATIONS_ARB:
> > +   case GL_COMPUTE_SHADER_INVOCATIONS_ARB:
> >     default:
> >        unreachable("Unrecognized query target in brw_begin_query()");
> >     }
> > @@ -289,6 +390,26 @@ gen6_end_query(struct gl_context *ctx, struct gl_query_object *q)
> >        write_xfb_primitives_written(brw, query->bo, query->Base.Stream, 1);
> >        break;
> >  
> > +   case GL_VERTICES_SUBMITTED_ARB:
> > +   case GL_PRIMITIVES_SUBMITTED_ARB:
> > +   case GL_VERTEX_SHADER_INVOCATIONS_ARB:
> > +   case GL_TESS_CONTROL_SHADER_PATCHES_ARB:
> > +   case GL_TESS_EVALUATION_SHADER_INVOCATIONS_ARB:
> > +   case GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB:
> > +   case GL_FRAGMENT_SHADER_INVOCATIONS_ARB:
> > +   case GL_COMPUTE_SHADER_INVOCATIONS_ARB:
> > +   case GL_CLIPPING_INPUT_PRIMITIVES_ARB:
> > +   case GL_CLIPPING_OUTPUT_PRIMITIVES_ARB:
> > +      emit_pipeline_stat(brw, query->bo,
> > +                         query->Base.Stream, query->Base.Target, 1);
> > +      break;
> > +   case GL_GEOMETRY_SHADER_INVOCATIONS:
> > +      /* GEOMETRY_SHADER_INVOCATIONS has a weirdly numbered target */
> > +      emit_pipeline_stat(brw, query->bo, query->Base.Stream,
> > +                         GL_VERTICES_SUBMITTED_ARB + MAX_PIPELINE_STATISTICS - 1,
> 
> Is this actually necessary?  It looks like this parameter is eventually
> just passed to pipeline_target_to_index, which does the right thing with
> GL_GEOMETRY_SHADER_INVOCATIONS.
> 

You are correct. This code changed a bit during the various internal versions
and I forgot to update this part. I'll fix this and audit the other GS special
cases while I am there.

Thanks. All the requests look good, and I'll post it in v3.
[snip]