[Mesa-dev] [PATCH] [v2] i965: implement ARB_pipeline_statistics_query
Ben Widawsky
ben at bwidawsk.net
Mon Dec 8 18:50:48 PST 2014
On Tue, Dec 02, 2014 at 11:07:34PM -0800, Ian Romanick wrote:
> Since there will be a v3 anyway, nits below...
>
> On 12/02/2014 06:33 PM, Ben Widawsky wrote:
> > This patch implements ARB_pipeline_statistics_query. This addition to GL does
> > not add a new API. Instead, it adds new tokens to the existing query APIs. The
> > work to hook up the new tokens is trivial due to it's similarity to the previous
> > work done for the query APIs. I've implemented all the new tokens to some
> > degree, but have stubbed out the untested ones at the entry point for Begin().
> > Doing this should allow the remainder of the code to be left in.
> >
> > The new tokens give GL clients a way to obtain stats about the GL pipeline.
> > Generally, you get the number of things going in, invocations, and number of
> > things coming out, primitives, of the various stages. There are two immediate
> > uses for this, performance information, and debugging various types of
> > misrendering. I doubt one can use these for debugging very complex applications,
> > but for piglit tests, it should be quite useful.
> >
> > Tessellation shaders, and compute shaders are not addressed in this patch
> > because there is no upstream implementation. I've implemented how I believe
> > tessellation shader stats will work for Intel hardware (though there is a bit of
> > ambiguity). Compute shaders are a bit more interesting though, and I don't yet
> > know what we'll do there.
>
>
> >
> > For the lazy, here is a link to the relevant part of the spec:
> > https://www.opengl.org/registry/specs/ARB/pipeline_statistics_query.txt
> >
> > Running the piglit tests
> > http://lists.freedesktop.org/archives/piglit/2014-November/013321.html
> > (http://cgit.freedesktop.org/~bwidawsk/piglit/log/?h=pipe_stats)
> > yield the following results:
> >
> >> python2 ./piglit-run.py -t stats tests/all.py output/pipeline_stats
> >> [5/5] pass: 5 Running Test(s): 5
> >
> > Previously I was seeing the adjacent vertex test failing on certain Intel
> > hardware. I am currently not able to reproduce this, and therefore for now, I'll
> > assume it was some transient issue which has been fixed.
> >
> > v2:
> > - Don't allow pipeline_stats to be per stream (Ilia). This would be needed for
> > AMD_transform_feedback4, which we do not support.
> > > If AMD_transform_feedback4 is supported then GEOMETRY_SHADER_PRIMITIVES_-
> > > EMITTED_ARB counts primitives emitted to any of the vertex streams for which
> > > STREAM_RASTERIZATION_AMD is enabled.
> > - Remove comment from GL3.txt because it is only used for extensions that are
> > part of required versions (Ilia)
> > - Move the new tokens to a new XML doc instead of using the main GL4x.xml (Ilia)
> > - Add a fallthrough comment (Ilia)
> > - Only divide PS invocations by 4 on HSW+ (Ben)
> >
> > Cc: Ilia Mirkin <imirkin at alum.mit.edu>
> > Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> > ---
> > .../glapi/gen/ARB_pipeline_statistics_query.xml | 24 ++++
> > src/mesa/drivers/dri/i965/gen6_queryobj.c | 121 +++++++++++++++++++++
> > src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
> > src/mesa/main/config.h | 3 +
> > src/mesa/main/extensions.c | 1 +
> > src/mesa/main/mtypes.h | 15 +++
> > src/mesa/main/queryobj.c | 77 +++++++++++++
> > 7 files changed, 242 insertions(+)
> > create mode 100644 src/mapi/glapi/gen/ARB_pipeline_statistics_query.xml
> >
> > diff --git a/src/mapi/glapi/gen/ARB_pipeline_statistics_query.xml b/src/mapi/glapi/gen/ARB_pipeline_statistics_query.xml
> > new file mode 100644
> > index 0000000..db37267
> > --- /dev/null
> > +++ b/src/mapi/glapi/gen/ARB_pipeline_statistics_query.xml
> > @@ -0,0 +1,24 @@
> > +<?xml version="1.0"?>
> > +<!DOCTYPE OpenGLAPI SYSTEM "gl_API.dtd">
> > +
> > +<!-- Note: no GLX protocol info yet. -->
> > +
> > +<OpenGLAPI>
> > +
> > +<category name="GL_ARB_pipeline_statistics_query" number="100">
> > +
> > + <enum name="VERTICES_SUBMITTED" value="0x82EE"/>
> > + <enum name="PRIMITIVES_SUBMITTED" value="0x82EF"/>
> > + <enum name="VERTEX_SHADER_INVOCATIONS" value="0x82F0"/>
> > + <enum name="TESS_CONTROL_SHADER_PATCHES" value="0x82F1"/>
> > + <enum name="TESS_EVALUATION_SHADER_INVOCATIONS" value="0x82F2"/>
> > + <!-- <enum name="GEOMETRY_SHADER_INVOCATIONS" value="0x887F"/> -->
> > + <enum name="GEOMETRY_SHADER_PRIMITIVES_EMITTED" value="0x82F3"/>
> > + <enum name="FRAGMENT_SHADER_INVOCATIONS" value="0x82F4"/>
> > + <enum name="COMPUTE_SHADER_INVOCATIONS" value="0x82F5"/>
> > + <enum name="CLIPPING_INPUT_PRIMITIVES" value="0x82F6"/>
> > + <enum name="CLIPPING_OUTPUT_PRIMITIVES" value="0x82F7"/>
> > +
> > +</category>
> > +
> > +</OpenGLAPI>
> > diff --git a/src/mesa/drivers/dri/i965/gen6_queryobj.c b/src/mesa/drivers/dri/i965/gen6_queryobj.c
> > index 130236e..f8b9bc3 100644
> > --- a/src/mesa/drivers/dri/i965/gen6_queryobj.c
> > +++ b/src/mesa/drivers/dri/i965/gen6_queryobj.c
> > @@ -109,6 +109,73 @@ write_xfb_primitives_written(struct brw_context *brw,
> > }
> > }
> >
> > +static inline const int
> > +pipeline_target_to_index(int target)
> > +{
> > + if (target == GL_GEOMETRY_SHADER_INVOCATIONS)
> > + return MAX_PIPELINE_STATISTICS - 1;
> > + else
> > + return target - GL_VERTICES_SUBMITTED_ARB;
> > +}
> > +
> > +static void
> > +emit_pipeline_stat(struct brw_context *brw, drm_intel_bo *bo,
> > + int stream, int target, int idx)
> > +{
> > + /*
> > + * There are 2 confusing parts to implementing the various target. The first is
> > + * the distinction between vertices submitted and primitives submitted. The
> > + * spec tries to clear this up, and luckily our hardware seems to understand
> > + * it:
> > + *
> > + * (8) What stage the VERTICES_SUBMITTED_ARB and PRIMITIVES_SUBMITTED_ARB
> > + * belong to? What do they count?
> > + *
> > + * DISCUSSION: There is no separate pipeline stage introduced in the
> > + * specification that matches D3D's "input assembler" stage. While the
> > + * latest version of the GL specification mentions a "vertex puller" stage
> > + * in the pipeline diagram, this stage does not have a corresponding chapter
> > + * in the specification that introduces it.
> > + *
> > + * RESOLVED: Introduce VERTICES_SUBMITTED_ARB and PRIMITIVES_SUBMITTED_ARB
> > + * in chapter 10, Vertex Specification and Drawing Command. They count the
> > + * total number of vertices and primitives processed by the GL. Including
> > + * multiple instances.
> > + *
> > + * The second confusion is the tessellation shader statistics. Our hardware has
> > + * no statistics specific to the TE unit. Ideally we could have the HS
> > + * primitives for TESS_CONTROL_SHADER_PATCHES_ARB, and the DS invocations as
> > + * the register for TESS_CONTROL_SHADER_PATCHES_ARB. Unfortunately we don't
> > + * have HS primitives, we only have HS invocations.
> > + */
> > +
> > + /* Everything except GEOMETRY_SHADER_INVOCATIONS can be kept in a simple
> > + * lookup table */
>
> */ should be on it's own line (here and elsewhere).
>
> > + const uint32_t target_to_register[] = {
>
> static. Otherwise GCC emits code to initialize the array.
>
sigh. I really wanted to not believe this is true.
> > + IA_VERTICES_COUNT, /* VERTICES_SUBMITTED */
> > + IA_PRIMITIVES_COUNT, /* PRIMITIVES_SUBMITTED */
> > + VS_INVOCATION_COUNT, /* VERTEX_SHADER_INVOCATIONS */
> > + 0, /* HS_INVOCATION_COUNT,*/ /* TESS_CONTROL_SHADER_PATCHES */
> > + 0, /* DS_INVOCATION_COUNT,*/ /* TESS_EVALUATION_SHADER_INVOCATIONS */
> > + GS_PRIMITIVES_COUNT, /* GEOMETRY_SHADER_PRIMITIVES_EMITTED */
> > + PS_INVOCATION_COUNT, /* FRAGMENT_SHADER_INVOCATIONS */
> > + 0, /* COMPUTE_SHADER_INVOCATIONS */
> > + CL_INVOCATION_COUNT, /* CLIPPING_INPUT_PRIMITIVES */
> > + CL_PRIMITIVES_COUNT, /* CLIPPING_OUTPUT_PRIMITIVES */
> > + GS_INVOCATION_COUNT /* This one is special... */
> > + };
> > + STATIC_ASSERT(ARRAY_SIZE(target_to_register) == MAX_PIPELINE_STATISTICS);
> > + uint32_t reg = target_to_register[pipeline_target_to_index(target)];
> > + assert(reg != 0);
> > +
> > + /* Emit a flush to make sure various parts of the pipeline are complete and
> > + * we get an accurate value */
> > + intel_batchbuffer_emit_mi_flush(brw);
> > +
> > + brw_store_register_mem64(brw, bo, reg, idx);
> > +}
> > +
> > +
> > /**
> > * Wait on the query object's BO and calculate the final result.
> > */
> > @@ -182,9 +249,29 @@ gen6_queryobj_get_results(struct gl_context *ctx,
> >
> > case GL_PRIMITIVES_GENERATED:
> > case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN:
> > + case GL_VERTICES_SUBMITTED_ARB:
> > + case GL_PRIMITIVES_SUBMITTED_ARB:
> > + case GL_VERTEX_SHADER_INVOCATIONS_ARB:
> > + case GL_GEOMETRY_SHADER_INVOCATIONS:
> > + case GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB:
> > + case GL_CLIPPING_INPUT_PRIMITIVES_ARB:
> > + case GL_CLIPPING_OUTPUT_PRIMITIVES_ARB:
> > query->Base.Result = results[1] - results[0];
> > break;
> >
> > + case GL_FRAGMENT_SHADER_INVOCATIONS_ARB:
> > + query->Base.Result = (results[1] - results[0]);
> > + /* HW reports this count 4X the actual value and therefore SW must divide
> > + * the count by 4 for correct reporting.
> > + * XXX: Not in public docs?
> > + */
> > + if (brw->gen >= 8 || brw->is_haswell)
> > + query->Base.Result /= 4;
> > + break;
> > +
> > + case GL_TESS_CONTROL_SHADER_PATCHES_ARB:
> > + case GL_TESS_EVALUATION_SHADER_INVOCATIONS_ARB:
> > + case GL_COMPUTE_SHADER_INVOCATIONS_ARB:
> > default:
> > unreachable("Unrecognized query target in brw_queryobj_get_results()");
> > }
> > @@ -251,6 +338,20 @@ gen6_begin_query(struct gl_context *ctx, struct gl_query_object *q)
> > write_xfb_primitives_written(brw, query->bo, query->Base.Stream, 0);
> > break;
> >
> > + case GL_VERTICES_SUBMITTED_ARB:
> > + case GL_PRIMITIVES_SUBMITTED_ARB:
> > + case GL_VERTEX_SHADER_INVOCATIONS_ARB:
> > + case GL_GEOMETRY_SHADER_INVOCATIONS:
> > + case GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB:
> > + case GL_FRAGMENT_SHADER_INVOCATIONS_ARB:
> > + case GL_CLIPPING_INPUT_PRIMITIVES_ARB:
> > + case GL_CLIPPING_OUTPUT_PRIMITIVES_ARB:
> > + emit_pipeline_stat(brw, query->bo, query->Base.Stream, query->Base.Target, 0);
> > + break;
> > +
> > + case GL_TESS_CONTROL_SHADER_PATCHES_ARB:
> > + case GL_TESS_EVALUATION_SHADER_INVOCATIONS_ARB:
> > + case GL_COMPUTE_SHADER_INVOCATIONS_ARB:
> > default:
> > unreachable("Unrecognized query target in brw_begin_query()");
> > }
> > @@ -289,6 +390,26 @@ gen6_end_query(struct gl_context *ctx, struct gl_query_object *q)
> > write_xfb_primitives_written(brw, query->bo, query->Base.Stream, 1);
> > break;
> >
> > + case GL_VERTICES_SUBMITTED_ARB:
> > + case GL_PRIMITIVES_SUBMITTED_ARB:
> > + case GL_VERTEX_SHADER_INVOCATIONS_ARB:
> > + case GL_TESS_CONTROL_SHADER_PATCHES_ARB:
> > + case GL_TESS_EVALUATION_SHADER_INVOCATIONS_ARB:
> > + case GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB:
> > + case GL_FRAGMENT_SHADER_INVOCATIONS_ARB:
> > + case GL_COMPUTE_SHADER_INVOCATIONS_ARB:
> > + case GL_CLIPPING_INPUT_PRIMITIVES_ARB:
> > + case GL_CLIPPING_OUTPUT_PRIMITIVES_ARB:
> > + emit_pipeline_stat(brw, query->bo,
> > + query->Base.Stream, query->Base.Target, 1);
> > + break;
> > + case GL_GEOMETRY_SHADER_INVOCATIONS:
> > + /* GEOMETRY_SHADER_INVOCATIONS has a weirdly numbered target */
> > + emit_pipeline_stat(brw, query->bo, query->Base.Stream,
> > + GL_VERTICES_SUBMITTED_ARB + MAX_PIPELINE_STATISTICS - 1,
>
> Is this actually necessary? It looks like this parameter is eventually
> just passed to pipeline_target_to_index, which does the right thing with
> GL_GEOMETRY_SHADER_INVOCATIONS.
>
You are correct. This code changed a bit during the various internal versions
and I forgot to update this part. I'll fix this and audit the other GS special
cases while I am there.
Thanks. All the requests look good, and I'll post it in v3.
[snip]
More information about the mesa-dev
mailing list