<div dir="ltr">On 17 May 2013 10:18, Kenneth Graunke <span dir="ltr"><<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Now that we have hardware contexts and can use MI_STORE_REGISTER_MEM,<br>
we can use the GPU's pipeline statistics counters rather than going out<br>
of our way to count primitives in software.<br>
<br>
Aside from being simpler, this also paves the way for Geometry Shaders,<br>
which can output an arbitrary number of primitives on the GPU.<br></blockquote><div><br></div><div>While you're giving rationales, another justification is that this will let us use hardware primitive restart support when transform feedback and/or primitive counting is in use.<br>
</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
The GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN query is easy: it<br>
corresponds to the SO_NUM_PRIMS_WRITTEN/SO_NUM_PRIMS_WRITTEN0_IVB<br>
counters.<br>
<br>
The GL_PRIMITIVES_GENERATED query is trickier. Gen provides several<br>
statistics registers which /almost/ match the semantics required:<br>
- IA_PRIMITIVES_COUNT<br>
The number of primitives fetched by the VF or IA (input assembler).<br>
This undercounts when GS is enabled, as it can output many primitives.<br>
- GS_PRIMITIVES_COUNT<br>
The number of primitives output by the GS. Unfortunately, this<br>
doesn't increment unless the GS unit is actually enabled, and it<br>
usually isn't.<br>
- SO_PRIM_STORAGE_NEEDED*_IVB<br>
The amount of space needed to write primitives output by transform<br>
feedback. These naturally only work when transform feedback is on.<br>
We'd also have to add the counters for all four streams.<br>
- CL_INVOCATION_COUNT<br>
The number of primitives processed by the clipper. This doesn't work<br>
if the GS or SOL throw away primitives for rasterizer discard.<br>
However, it does increment even if the clipper is in REJECT_ALL mode.<br>
<br>
Dynamically switching between counters would be painfully complicated,<br>
especially since GS, rasterizer discard, and transform feedback can all<br>
be switched on and off repeatedly during a single query.<br>
<br>
The most usable counter is CL_INVOCATION_COUNT. The previous two<br>
patches reworked rasterizer discard support so that all primitives hit<br>
the clipper, making this work.<br>
<br>
Cc: Eric Anholt <<a href="mailto:eric@anholt.net">eric@anholt.net</a>><br>
Cc: Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>><br>
Signed-off-by: Kenneth Graunke <<a href="mailto:kenneth@whitecape.org">kenneth@whitecape.org</a>><br>
---<br>
src/mesa/drivers/dri/i965/gen6_queryobj.c | 105 +++++++++++++++++++-----------<br>
1 file changed, 66 insertions(+), 39 deletions(-)<br>
<br>
diff --git a/src/mesa/drivers/dri/i965/gen6_queryobj.c b/src/mesa/drivers/dri/i965/gen6_queryobj.c<br>
index 28af8d7..a032227 100644<br>
--- a/src/mesa/drivers/dri/i965/gen6_queryobj.c<br>
+++ b/src/mesa/drivers/dri/i965/gen6_queryobj.c<br>
@@ -94,6 +94,57 @@ write_depth_count(struct intel_context *intel, drm_intel_bo *query_bo, int idx)<br>
ADVANCE_BATCH();<br>
}<br>
<br>
+/*<br>
+ * Write an arbitrary 64-bit register to a buffer via MI_STORE_REGISTER_MEM.<br>
+ *<br>
+ * Only TIMESTAMP and PS_DEPTH_COUNT have special PIPE_CONTROL support; other<br>
+ * counters have to be read via the generic MI_STORE_REGISTER_MEM. This<br>
+ * function also performs a pipeline flush for proper synchronization.<br>
+ */<br>
+static void<br>
+write_reg(struct intel_context *intel,<br>
+ drm_intel_bo *query_bo, uint32_t reg, int idx)<br>
+{<br>
+ assert(intel->gen >= 6);<br>
+<br>
+ intel_batchbuffer_emit_mi_flush(intel);<br>
+<br>
+ /* MI_STORE_REGISTER_MEM only stores a single 32-bit value, so to<br>
+ * read a full 64-bit register, we need to do two of them.<br>
+ */<br>
+ BEGIN_BATCH(3);<br>
+ OUT_BATCH(MI_STORE_REGISTER_MEM | (3 - 2));<br>
+ OUT_BATCH(reg);<br>
+ OUT_RELOC(query_bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER,<br>
+ idx * sizeof(uint64_t));<br>
+ ADVANCE_BATCH();<br>
+<br>
+ BEGIN_BATCH(3);<br>
+ OUT_BATCH(MI_STORE_REGISTER_MEM | (3 - 2));<br>
+ OUT_BATCH(reg + sizeof(uint32_t));<br>
+ OUT_RELOC(query_bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER,<br>
+ sizeof(uint32_t) + idx * sizeof(uint64_t));<br>
+ ADVANCE_BATCH();<br>
+}<br>
+<br>
+static void<br>
+write_primitives_generated(struct intel_context *intel,<br>
+ drm_intel_bo *query_bo, int idx)<br>
+{<br>
+ write_reg(intel, query_bo, CL_INVOCATION_COUNT, idx);<br>
+}<br>
+<br>
+static void<br>
+write_xfb_primitives_written(struct intel_context *intel,<br>
+ drm_intel_bo *query_bo, int idx)<br>
+{<br>
+ if (intel->gen >= 7) {<br>
+ write_reg(intel, query_bo, SO_NUM_PRIMS_WRITTEN0_IVB, idx);<br>
+ } else {<br>
+ write_reg(intel, query_bo, SO_NUM_PRIMS_WRITTEN, idx);<br>
+ }<br>
+}<br>
+<br>
/**<br>
* Wait on the query object's BO and calculate the final result.<br>
*/<br>
@@ -152,21 +203,20 @@ gen6_queryobj_get_results(struct gl_context *ctx,<br>
query->Base.Result &= (1ull << 36) - 1;<br>
break;<br>
<br>
- case GL_SAMPLES_PASSED_ARB:<br>
- query->Base.Result += results[1] - results[0];<br>
- break;<br>
-<br>
case GL_ANY_SAMPLES_PASSED:<br>
case GL_ANY_SAMPLES_PASSED_CONSERVATIVE:<br>
- query->Base.Result = results[0] != results[1];<br>
+ if (results[0] != results[1])<br>
+ query->Base.Result = true;<br></blockquote><div><br></div><div>This looks like it got squashed into the wrong patch of the series.<br><br></div><div>With that fixed, this patch is:<br><br>Reviewed-by: Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>><br>
</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
break;<br>
<br>
+ case GL_SAMPLES_PASSED_ARB:<br>
case GL_PRIMITIVES_GENERATED:<br>
case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN:<br>
- /* We don't actually query the hardware for this value, so query->bo<br>
- * should always be NULL and execution should never reach here.<br>
+ /* We need to use += rather than = here since some BLT-based operations<br>
+ * may have added additional samples to our occlusion query value.<br>
+ * It shouldn't matter for geometry queries, but is harmless.<br>
*/<br>
- assert(!"Unreachable");<br>
+ query->Base.Result += results[1] - results[0];<br>
break;<br>
<br>
default:<br>
@@ -191,10 +241,13 @@ gen6_queryobj_get_results(struct gl_context *ctx,<br>
static void<br>
gen6_begin_query(struct gl_context *ctx, struct gl_query_object *q)<br>
{<br>
- struct brw_context *brw = brw_context(ctx);<br>
struct intel_context *intel = intel_context(ctx);<br>
struct brw_query_object *query = (struct brw_query_object *)q;<br>
<br>
+ /* Since we're starting a new query, we need to throw away old results. */<br>
+ drm_intel_bo_unreference(query->bo);<br>
+ query->bo = drm_intel_bo_alloc(intel->bufmgr, "query results", 4096, 4096);<br>
+<br>
switch (query->Base.Target) {<br>
case GL_TIME_ELAPSED:<br>
/* For timestamp queries, we record the starting time right away so that<br>
@@ -216,36 +269,21 @@ gen6_begin_query(struct gl_context *ctx, struct gl_query_object *q)<br>
* obtain the time elapsed. Notably, this includes time elapsed while<br>
* the system was doing other work, such as running other applications.<br>
*/<br>
- drm_intel_bo_unreference(query->bo);<br>
- query->bo = drm_intel_bo_alloc(intel->bufmgr, "timer query", 4096, 4096);<br>
write_timestamp(intel, query->bo, 0);<br>
break;<br>
<br>
case GL_ANY_SAMPLES_PASSED:<br>
case GL_ANY_SAMPLES_PASSED_CONSERVATIVE:<br>
case GL_SAMPLES_PASSED_ARB:<br>
- /* Since we're starting a new query, we need to be sure to throw away<br>
- * any previous occlusion query results.<br>
- */<br>
- drm_intel_bo_unreference(query->bo);<br>
- query->bo = drm_intel_bo_alloc(intel->bufmgr, "occl. query", 4096, 4096);<br>
write_depth_count(intel, query->bo, 0);<br>
break;<br>
<br>
case GL_PRIMITIVES_GENERATED:<br>
- /* We don't actually query the hardware for this value; we keep track of<br>
- * it a software counter. So just reset the counter.<br>
- */<br>
- brw->sol.primitives_generated = 0;<br>
- brw->sol.counting_primitives_generated = true;<br>
+ write_primitives_generated(intel, query->bo, 0);<br>
break;<br>
<br>
case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN:<br>
- /* We don't actually query the hardware for this value; we keep track of<br>
- * it a software counter. So just reset the counter.<br>
- */<br>
- brw->sol.primitives_written = 0;<br>
- brw->sol.counting_primitives_written = true;<br>
+ write_xfb_primitives_written(intel, query->bo, 0);<br>
break;<br>
<br>
default:<br>
@@ -265,7 +303,6 @@ gen6_begin_query(struct gl_context *ctx, struct gl_query_object *q)<br>
static void<br>
gen6_end_query(struct gl_context *ctx, struct gl_query_object *q)<br>
{<br>
- struct brw_context *brw = brw_context(ctx);<br>
struct intel_context *intel = intel_context(ctx);<br>
struct brw_query_object *query = (struct brw_query_object *)q;<br>
<br>
@@ -281,21 +318,11 @@ gen6_end_query(struct gl_context *ctx, struct gl_query_object *q)<br>
break;<br>
<br>
case GL_PRIMITIVES_GENERATED:<br>
- /* We don't actually query the hardware for this value; we keep track of<br>
- * it in a software counter. So just read the counter and store it in<br>
- * the query object.<br>
- */<br>
- query->Base.Result = brw->sol.primitives_generated;<br>
- brw->sol.counting_primitives_generated = false;<br>
+ write_primitives_generated(intel, query->bo, 1);<br>
break;<br>
<br>
case GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN:<br>
- /* We don't actually query the hardware for this value; we keep track of<br>
- * it in a software counter. So just read the counter and store it in<br>
- * the query object.<br>
- */<br>
- query->Base.Result = brw->sol.primitives_written;<br>
- brw->sol.counting_primitives_written = false;<br>
+ write_xfb_primitives_written(intel, query->bo, 1);<br>
break;<br>
<br>
default:<br>
<span class="HOEnZb"><font color="#888888">--<br>
1.8.2.3<br>
<br>
</font></span></blockquote></div><br></div></div>