[Mesa-dev] [Mesa-stable] [PATCH] i965/gen8+: Do full stall when switching pipeline

Pohjolainen, Topi topi.pohjolainen at gmail.com
Sun Mar 19 08:45:59 UTC 2017


On Thu, Mar 16, 2017 at 01:02:00PM -0700, Francisco Jerez wrote:
> Jason Ekstrand <jason at jlekstrand.net> writes:
> 
> > Thanks for tracking this down!
> >
> > Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
> >
> > I'm not sure how we missed this when CS was brought up.  Oh well.
> >
> 
> I don't think we did.  The hardware docs are somewhat contradictory
> about whether this is required:
> 
> | Render CS Only: SW must always program PIPE_CONTROL with CS Stall and
> | Render Target Cache Flush Enable set prior to programming
> | PIPELINE_SELECT command for GPGPU workloads i.e when pipeline mode is
> | set to GPGPU. This is required to achieve better GPGPU preemption
> | latencies for certain programming sequences. If programming
> | PIPE_CONTROL has performance implications then preemption latencies
> | can be trade off against performance by not implementing this
> | programming note.
> 
> This sounds like more of a suggestion (which potentially trades off
> throughput for GPGPU latency due to the synchronous execution of the
> compute shader) than a requirement, so the stall may not be required and
> is likely to impact performance.
> 
> I wonder if the following workaround would be as effective fixing the
> bug linked below:
> 
>  - Speculatively flush the render and depth caches with a pipelined
>    PIPE_CONTROL right before PIPELINE_SELECT (note that this still
>    allows some parallelism between 3D and GPGPU workloads).
> 
>  - On brw_render_cache_set_check_flush(), if the GPGPU pipeline is
>    selected, emit an HDC flush+CS stall instead of the depth/render
>    target flush+CS stall.

This looked to be sufficient also on my SKL at least, note that this leaves
the CS stall bit out from compute tex cache invalidate.

While I'm comfortable with the fix lining gen8+ with earlier gens (which is
according to the spec), this here looks more an optimization which should be
applied to all gens. It also requires more in-depth understanding of the
pipeline switch (which I don't think I have). 

I have pushed my patch already before I actually saw your response. This here
in my opinion requires some sort of comment telling why we did things
differently then described in the spec. I'd be happier if you followed up on
this instead of me.

diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c b/src/mesa/drivers/dri/i965/brw_misc_state.c
index 84f0c18..9704a9f 100644
--- a/src/mesa/drivers/dri/i965/brw_misc_state.c
+++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
@@ -844,6 +844,13 @@ brw_emit_select_pipeline(struct brw_context *brw, enum brw_pipeline pipeline)
          brw->ctx.NewDriverState |= BRW_NEW_CC_STATE;
       }
 
+      const unsigned dc_flush = brw->last_pipeline == BRW_COMPUTE_PIPELINE ?
+                                PIPE_CONTROL_DATA_CACHE_FLUSH :
+                                PIPE_CONTROL_RENDER_TARGET_FLUSH |
+                                PIPE_CONTROL_DEPTH_CACHE_FLUSH;
+
+      brw_emit_pipe_control_flush(brw, dc_flush | PIPE_CONTROL_CS_STALL);
+
    } else if (brw->gen >= 6) {
       /* From "BXML <C2><BB> GT <C2><BB> MI <C2><BB> vol1a GPU Overview <C2>
<BB> [Instruction]
        * PIPELINE_SELECT [DevBWR+]":
diff --git a/src/mesa/drivers/dri/i965/intel_fbo.c b/src/mesa/drivers/dri/i965/intel_fbo.c
index de0cd6a..3126be6 100644
--- a/src/mesa/drivers/dri/i965/intel_fbo.c
+++ b/src/mesa/drivers/dri/i965/intel_fbo.c
@@ -1082,10 +1082,16 @@ brw_render_cache_set_check_flush(struct brw_context *brw, drm_intel_bo *bo)
       return;
 
    if (brw->gen >= 6) {
-      brw_emit_pipe_control_flush(brw,
-                                  PIPE_CONTROL_DEPTH_CACHE_FLUSH |
-                                  PIPE_CONTROL_RENDER_TARGET_FLUSH |
-                                  PIPE_CONTROL_CS_STALL);
+      if (brw->last_pipeline == BRW_COMPUTE_PIPELINE) {
+         brw_emit_pipe_control_flush(brw,
+                                     PIPE_CONTROL_DATA_CACHE_FLUSH |
+                                     PIPE_CONTROL_CS_STALL);
+      } else {
+         brw_emit_pipe_control_flush(brw,
+                                     PIPE_CONTROL_DEPTH_CACHE_FLUSH |
+                                     PIPE_CONTROL_RENDER_TARGET_FLUSH |
+                                     PIPE_CONTROL_CS_STALL);
+      }
 
       brw_emit_pipe_control_flush(brw,
                                   PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |

> 
> > On Thu, Mar 16, 2017 at 1:24 AM, Topi Pohjolainen <
> > topi.pohjolainen at gmail.com> wrote:
> >
> >> just as earlier gens do.
> >>
> >> CC:  Jason Ekstrand <jason at jlekstrand.net>
> >> Cc: "17.0 13.0" <mesa-stable at lists.freedesktop.org>
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96743
> >>
> >> Signed-off-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_misc_state.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c
> >> b/src/mesa/drivers/dri/i965/brw_misc_state.c
> >> index 84f0c18..1cf6b04 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_misc_state.c
> >> +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
> >> @@ -843,8 +843,9 @@ brw_emit_select_pipeline(struct brw_context *brw,
> >> enum brw_pipeline pipeline)
> >>
> >>           brw->ctx.NewDriverState |= BRW_NEW_CC_STATE;
> >>        }
> >> +   }
> >>
> >> -   } else if (brw->gen >= 6) {
> >> +   if (brw->gen >= 6) {
> >>        /* From "BXML » GT » MI » vol1a GPU Overview » [Instruction]
> >>         * PIPELINE_SELECT [DevBWR+]":
> >>         *
> >> --
> >> 2.9.3
> >>
> >>
> > _______________________________________________
> > mesa-stable mailing list
> > mesa-stable at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-stable





More information about the mesa-dev mailing list