[Mesa-stable] [PATCH] i965/gen8+: Do full stall when switching pipeline
Pohjolainen, Topi
topi.pohjolainen at gmail.com
Sun Mar 19 09:01:58 UTC 2017
On Sun, Mar 19, 2017 at 10:45:59AM +0200, Pohjolainen, Topi wrote:
> On Thu, Mar 16, 2017 at 01:02:00PM -0700, Francisco Jerez wrote:
> > Jason Ekstrand <jason at jlekstrand.net> writes:
> >
> > > Thanks for tracking this down!
> > >
> > > Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
> > >
> > > I'm not sure how we missed this when CS was brought up. Oh well.
> > >
> >
> > I don't think we did. The hardware docs are somewhat contradictory
> > about whether this is required:
> >
> > | Render CS Only: SW must always program PIPE_CONTROL with CS Stall and
> > | Render Target Cache Flush Enable set prior to programming
> > | PIPELINE_SELECT command for GPGPU workloads i.e when pipeline mode is
> > | set to GPGPU. This is required to achieve better GPGPU preemption
> > | latencies for certain programming sequences. If programming
> > | PIPE_CONTROL has performance implications then preemption latencies
> > | can be trade off against performance by not implementing this
> > | programming note.
> >
> > This sounds like more of a suggestion (which potentially trades off
> > throughput for GPGPU latency due to the synchronous execution of the
> > compute shader) than a requirement, so the stall may not be required and
> > is likely to impact performance.
> >
> > I wonder if the following workaround would be as effective fixing the
> > bug linked below:
> >
> > - Speculatively flush the render and depth caches with a pipelined
> > PIPE_CONTROL right before PIPELINE_SELECT (note that this still
> > allows some parallelism between 3D and GPGPU workloads).
> >
> > - On brw_render_cache_set_check_flush(), if the GPGPU pipeline is
> > selected, emit an HDC flush+CS stall instead of the depth/render
> > target flush+CS stall.
>
> This looked to be sufficient also on my SKL at least, note that this leaves
> the CS stall bit out from compute tex cache invalidate.
>
> While I'm comfortable with the fix lining gen8+ with earlier gens (which is
> according to the spec), this here looks more an optimization which should be
> applied to all gens. It also requires more in-depth understanding of the
> pipeline switch (which I don't think I have).
>
> I have pushed my patch already before I actually saw your response. This here
> in my opinion requires some sort of comment telling why we did things
> differently then described in the spec. I'd be happier if you followed up on
> this instead of me.
>
> diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c b/src/mesa/drivers/dri/i965/brw_misc_state.c
> index 84f0c18..9704a9f 100644
> --- a/src/mesa/drivers/dri/i965/brw_misc_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
> @@ -844,6 +844,13 @@ brw_emit_select_pipeline(struct brw_context *brw, enum brw_pipeline pipeline)
> brw->ctx.NewDriverState |= BRW_NEW_CC_STATE;
> }
>
> + const unsigned dc_flush = brw->last_pipeline == BRW_COMPUTE_PIPELINE ?
> + PIPE_CONTROL_DATA_CACHE_FLUSH :
> + PIPE_CONTROL_RENDER_TARGET_FLUSH |
> + PIPE_CONTROL_DEPTH_CACHE_FLUSH;
> +
> + brw_emit_pipe_control_flush(brw, dc_flush | PIPE_CONTROL_CS_STALL);
> +
> } else if (brw->gen >= 6) {
> /* From "BXML <C2><BB> GT <C2><BB> MI <C2><BB> vol1a GPU Overview <C2>
> <BB> [Instruction]
> * PIPELINE_SELECT [DevBWR+]":
> diff --git a/src/mesa/drivers/dri/i965/intel_fbo.c b/src/mesa/drivers/dri/i965/intel_fbo.c
> index de0cd6a..3126be6 100644
> --- a/src/mesa/drivers/dri/i965/intel_fbo.c
> +++ b/src/mesa/drivers/dri/i965/intel_fbo.c
> @@ -1082,10 +1082,16 @@ brw_render_cache_set_check_flush(struct brw_context *brw, drm_intel_bo *bo)
> return;
>
> if (brw->gen >= 6) {
> - brw_emit_pipe_control_flush(brw,
> - PIPE_CONTROL_DEPTH_CACHE_FLUSH |
> - PIPE_CONTROL_RENDER_TARGET_FLUSH |
> - PIPE_CONTROL_CS_STALL);
> + if (brw->last_pipeline == BRW_COMPUTE_PIPELINE) {
> + brw_emit_pipe_control_flush(brw,
> + PIPE_CONTROL_DATA_CACHE_FLUSH |
> + PIPE_CONTROL_CS_STALL);
> + } else {
> + brw_emit_pipe_control_flush(brw,
> + PIPE_CONTROL_DEPTH_CACHE_FLUSH |
> + PIPE_CONTROL_RENDER_TARGET_FLUSH |
> + PIPE_CONTROL_CS_STALL);
> + }
Flushing depth and rt in compute is of course wrong and this bit is genuine
fix. I'll write a patch for it. I'll just need to study a little what the
render cache means in compute context - initially I feel we shouldn't actually
regard that at all in compute.
>
> brw_emit_pipe_control_flush(brw,
> PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
>
> >
> > > On Thu, Mar 16, 2017 at 1:24 AM, Topi Pohjolainen <
> > > topi.pohjolainen at gmail.com> wrote:
> > >
> > >> just as earlier gens do.
> > >>
> > >> CC: Jason Ekstrand <jason at jlekstrand.net>
> > >> Cc: "17.0 13.0" <mesa-stable at lists.freedesktop.org>
> > >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96743
> > >>
> > >> Signed-off-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
> > >> ---
> > >> src/mesa/drivers/dri/i965/brw_misc_state.c | 3 ++-
> > >> 1 file changed, 2 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c
> > >> b/src/mesa/drivers/dri/i965/brw_misc_state.c
> > >> index 84f0c18..1cf6b04 100644
> > >> --- a/src/mesa/drivers/dri/i965/brw_misc_state.c
> > >> +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
> > >> @@ -843,8 +843,9 @@ brw_emit_select_pipeline(struct brw_context *brw,
> > >> enum brw_pipeline pipeline)
> > >>
> > >> brw->ctx.NewDriverState |= BRW_NEW_CC_STATE;
> > >> }
> > >> + }
> > >>
> > >> - } else if (brw->gen >= 6) {
> > >> + if (brw->gen >= 6) {
> > >> /* From "BXML » GT » MI » vol1a GPU Overview » [Instruction]
> > >> * PIPELINE_SELECT [DevBWR+]":
> > >> *
> > >> --
> > >> 2.9.3
> > >>
> > >>
> > > _______________________________________________
> > > mesa-stable mailing list
> > > mesa-stable at lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-stable
>
>
>
More information about the mesa-stable
mailing list