[Mesa-dev] [PATCH 4/7] i965: Add an end-of-pipe sync helper

Wed Jun 14 20:41:56 UTC 2017

On Tuesday, June 13, 2017 2:53:24 PM PDT Jason Ekstrand wrote:
> From: Topi Pohjolainen <topi.pohjolainen at intel.com>
> 
> v2 (Jason Ekstrand):
>  - Take a flags parameter to control the flushes
>  - Refactoring
> 
> Signed-off-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
> ---
>  src/mesa/drivers/dri/i965/brw_context.h      |  1 +
>  src/mesa/drivers/dri/i965/brw_pipe_control.c | 96 +++++++++++++++++++++++++++-
>  2 files changed, 96 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
> index 7b9be8a..b137409 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1641,6 +1641,7 @@ void brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags);
>  void brw_emit_pipe_control_write(struct brw_context *brw, uint32_t flags,
>                                   struct brw_bo *bo, uint32_t offset,
>                                   uint64_t imm);
> +void brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags);
>  void brw_emit_mi_flush(struct brw_context *brw);
>  void brw_emit_post_sync_nonzero_flush(struct brw_context *brw);
>  void brw_emit_depth_stall_flushes(struct brw_context *brw);
> diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> index 39bb9c7..338e4fc 100644
> --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
> +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> @@ -271,7 +271,6 @@ gen7_emit_cs_stall_flush(struct brw_context *brw)
>                                 brw->workaround_bo, 0, 0);
>  }
>  
> -
>  /**
>   * Emits a PIPE_CONTROL with a non-zero post-sync operation, for
>   * implementing two workarounds on gen6.  From section 1.4.7.1
> @@ -320,6 +319,101 @@ brw_emit_post_sync_nonzero_flush(struct brw_context *brw)
>                                 brw->workaround_bo, 0, 0);
>  }
>  
> +/*
> + * From Sandybridge PRM, volume 2, "1.7.2 End-of-Pipe Synchronization":
> + *
> + *  Write synchronization is a special case of end-of-pipe
> + *  synchronization that requires that the render cache and/or depth
> + *  related caches are flushed to memory, where the data will become
> + *  globally visible. This type of synchronization is required prior to
> + *  SW (CPU) actually reading the result data from memory, or initiating
> + *  an operation that will use as a read surface (such as a texture
> + *  surface) a previous render target and/or depth/stencil buffer
> + *
> + *
> + * From Haswell PRM, volume 2, part 1, "End-of-Pipe Synchronization":
> + *
> + *  Exercising the write cache flush bits (Render Target Cache Flush
> + *  Enable, Depth Cache Flush Enable, DC Flush) in PIPE_CONTROL only
> + *  ensures the write caches are flushed and doesn't guarantee the data
> + *  is globally visible.
> + *
> + *  SW can track the completion of the end-of-pipe-synchronization by
> + *  using "Notify Enable" and "PostSync Operation - Write Immediate
> + *  Data" in the PIPE_CONTROL command. 
> + */
> +void
> +brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags)
> +{
> +   if (brw->gen >= 6) {
> +      /* From Sandybridge PRM, volume 2, "1.7.3.1 Writing a Value to Memory":
> +       *
> +       *    "The most common action to perform upon reaching a synchronization
> +       *    point is to write a value out to memory. An immediate value
> +       *    (included with the synchronization command) may be written."
> +       *
> +       *
> +       * From Broadwell PRM, volume 7, "End-of-Pipe Synchronization":
> +       *
> +       *    "In case the data flushed out by the render engine is to be read
> +       *    back in to the render engine in coherent manner, then the render
> +       *    engine has to wait for the fence completion before accessing the
> +       *    flushed data. This can be achieved by following means on various
> +       *    products: PIPE_CONTROL command with CS Stall and the required
> +       *    write caches flushed with Post-Sync-Operation as Write Immediate
> +       *    Data.
> +       *
> +       *    Example:
> +       *       - Workload-1 (3D/GPGPU/MEDIA)
> +       *       - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write Immediate
> +       *         Data, Required Write Cache Flush bits set)
> +       *       - Workload-2 (Can use the data produce or output by Workload-1)
> +       */
> +      brw_emit_pipe_control_write(brw,
> +                                  flags | PIPE_CONTROL_CS_STALL |
> +                                  PIPE_CONTROL_WRITE_IMMEDIATE,
> +                                  brw->workaround_bo, 0, 0);
> +
> +      if (brw->is_haswell) {
> +         /* Haswell needs addition work-arounds:
> +          *
> +          * From Haswell PRM, volume 2, part 1, "End-of-Pipe Synchronization":
> +          *
> +          *    Option 1:
> +          *    PIPE_CONTROL command with the CS Stall and the required write
> +          *    caches flushed with Post-SyncOperation as Write Immediate Data
> +          *    followed by eight dummy MI_STORE_DATA_IMM (write to scratch
> +          *    spce) commands.
> +          *
> +          *    Example:
> +          *       - Workload-1
> +          *       - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write
> +          *         Immediate Data, Required Write Cache Flush bits set)
> +          *       - MI_STORE_DATA_IMM (8 times) (Dummy data, Scratch Address)
> +          *       - Workload-2 (Can use the data produce or output by
> +          *         Workload-1)
> +          *
> +          * Unfortunately, both the PRMs and the internal docs are a bit
> +          * out-of-date in this regard.  What the windows driver does (and
> +          * this appears to actually work) is to emit a register read from the
> +          * memory address written by the pipe control above.
> +          *
> +          * What register we load into doesn't matter.  We choose an indirect
> +          * rendering register because we know it always exists and doesn't
> +          * require command parser support.  This is perfectly safe to do
> +          * since we always re-load all of the indirect draw registers right
> +          * before 3DPRIMITIVE when needed anyway.
> +          */
> +         brw_load_register_mem(brw, GEN7_3DPRIM_START_INSTANCE,
> +                               brw->workaround_bo,
> +                               I915_GEM_DOMAIN_INSTRUCTION, 0, 0);

This does require command parser support.  Your MI_LOAD_REGISTER_MEM will be
converted to MI_NOOP if the command parser is not enabled.  That means that
the workaround effectively won't happen for users with kernels older than 4.2
(specifically kernel sha 245054a1fe33c06ad233e0d58a27ec7b64db9284).

ChromeOS in particular was running a 3.8 based kernel, with no command
parser, last I checked, which means they won't get this bug fix.  Applying
a kernel patch may actually be a faster route to getting this bug fixed,
for them, at least.  For normal Linux distro users, this Mesa patch is
probably the fastest approach.  ChromeOS should update their kernel to 4.4+
anyway.

Please update the comment.

> +      }
> +   } else {
> +      /* On gen4-5, a regular pipe control seems to suffice. */
> +      brw_emit_pipe_control_flush(brw, flags);
> +   }
> +}
> +
>  /* Emit a pipelined flush to either flush render and texture cache for
>   * reading from a FBO-drawn texture, or flush so that frontbuffer
>   * render appears on the screen in DRI1.
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170614/15d25b5b/attachment.sig>