[Mesa-dev] [PATCH 7/7] i965: Rework the extra flushes surrounding occlusion queries.

Daniel Vetter daniel at ffwll.ch
Wed Aug 8 00:41:44 PDT 2012


On Tue, Aug 07, 2012 at 04:05:33PM -0700, Kenneth Graunke wrote:
> Separate out the depth stall from the depth count write.  Workarounds
> say that a depth stall needs to be preceeded with a non-zero post-sync
> op (in this case, the depth count write).  Also, before the non-zero
> post-sync op, we need a CS stall, which needs a stall at scoreboard.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>

In my understanding of Bspec (haven't done any experiments on hw) we need
to set the depth stall bit on the pipe_control with the depth_count write
(bspec for both snb&ivb even says that depth stall should be disable if
post sync op != write_depth). So I think we need to keep these two
together and simply emit the entire nonzero postsync op workaround on
gen6, like we already do for render cache flushes.

In my reading of bspec, no such workaround is required on gen7+
-Daniel

> ---
>  src/mesa/drivers/dri/i965/brw_queryobj.c | 36 ++++++++++++++++++++++++--------
>  1 file changed, 27 insertions(+), 9 deletions(-)
> 
> This does remove the CS stall on Ivybridge.
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i965/brw_queryobj.c
> index 1e03d08..4c561ad 100644
> --- a/src/mesa/drivers/dri/i965/brw_queryobj.c
> +++ b/src/mesa/drivers/dri/i965/brw_queryobj.c
> @@ -91,17 +91,24 @@ static void
>  write_depth_count(struct intel_context *intel, drm_intel_bo *query_bo, int idx)
>  {
>     if (intel->gen >= 6) {
> -      BEGIN_BATCH(9);
> -
> -      /* workaround: CS stall required before depth stall. */
> -      OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2));
> -      OUT_BATCH(PIPE_CONTROL_CS_STALL);
> -      OUT_BATCH(0); /* write address */
> -      OUT_BATCH(0); /* write data */
> +      /* Emit Sandybridge workaround flush: */
> +      if (intel->gen == 6) {
> +         /* The timestamp write below is a non-zero post-sync op, which on
> +          * Gen6 necessitates a CS stall.  CS stalls need stall at scoreboard
> +          * set.  See the comments for intel_emit_post_sync_nonzero_flush().
> +          */
> +         BEGIN_BATCH(4);
> +         OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2));
> +         OUT_BATCH(PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD);
> +         OUT_BATCH(0);
> +         OUT_BATCH(0);
> +         ADVANCE_BATCH();
> +      }
>  
> +      /* Emit the actual depth count write: */
> +      BEGIN_BATCH(5);
>        OUT_BATCH(_3DSTATE_PIPE_CONTROL | (5 - 2));
> -      OUT_BATCH(PIPE_CONTROL_DEPTH_STALL |
> -                PIPE_CONTROL_WRITE_DEPTH_COUNT);
> +      OUT_BATCH(PIPE_CONTROL_WRITE_DEPTH_COUNT);
>        OUT_RELOC(query_bo,
>                  I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION,
>                  PIPE_CONTROL_GLOBAL_GTT_WRITE |
> @@ -109,6 +116,17 @@ write_depth_count(struct intel_context *intel, drm_intel_bo *query_bo, int idx)
>        OUT_BATCH(0);
>        OUT_BATCH(0);
>        ADVANCE_BATCH();
> +
> +      /* We need to emit a depth stall to get the right value for the depth
> +       * count.  As a workaround this needs a preceeding PIPE_CONTROL with a
> +       * non-zero post-sync op.  The depth count write above does that for us.
> +       */
> +      BEGIN_BATCH(4);
> +      OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2));
> +      OUT_BATCH(PIPE_CONTROL_DEPTH_STALL);
> +      OUT_BATCH(0);
> +      OUT_BATCH(0);
> +      ADVANCE_BATCH();
>     } else {
>        BEGIN_BATCH(4);
>        OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2) |
> -- 
> 1.7.11.4
> 

-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48


More information about the mesa-dev mailing list