[Mesa-dev] [PATCH 4/7] i965: Add an end-of-pipe sync helper

Thu Jun 15 15:59:19 UTC 2017

On Thu, Jun 15, 2017 at 4:11 AM, Chris Wilson <chris at chris-wilson.co.uk>
wrote:

> Quoting Kenneth Graunke (2017-06-14 21:41:56)
> > On Tuesday, June 13, 2017 2:53:24 PM PDT Jason Ekstrand wrote:
> > > From: Topi Pohjolainen <topi.pohjolainen at intel.com>
> > >
> > > v2 (Jason Ekstrand):
> > >  - Take a flags parameter to control the flushes
> > >  - Refactoring
> > >
> > > Signed-off-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_context.h      |  1 +
> > >  src/mesa/drivers/dri/i965/brw_pipe_control.c | 96
> +++++++++++++++++++++++++++-
> > >  2 files changed, 96 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h
> b/src/mesa/drivers/dri/i965/brw_context.h
> > > index 7b9be8a..b137409 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > > @@ -1641,6 +1641,7 @@ void brw_emit_pipe_control_flush(struct
> brw_context *brw, uint32_t flags);
> > >  void brw_emit_pipe_control_write(struct brw_context *brw, uint32_t
> flags,
> > >                                   struct brw_bo *bo, uint32_t offset,
> > >                                   uint64_t imm);
> > > +void brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t
> flags);
> > >  void brw_emit_mi_flush(struct brw_context *brw);
> > >  void brw_emit_post_sync_nonzero_flush(struct brw_context *brw);
> > >  void brw_emit_depth_stall_flushes(struct brw_context *brw);
> > > diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c
> b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> > > index 39bb9c7..338e4fc 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> > > @@ -271,7 +271,6 @@ gen7_emit_cs_stall_flush(struct brw_context *brw)
> > >                                 brw->workaround_bo, 0, 0);
> > >  }
> > >
> > > -
> > >  /**
> > >   * Emits a PIPE_CONTROL with a non-zero post-sync operation, for
> > >   * implementing two workarounds on gen6.  From section 1.4.7.1
> > > @@ -320,6 +319,101 @@ brw_emit_post_sync_nonzero_flush(struct
> brw_context *brw)
> > >                                 brw->workaround_bo, 0, 0);
> > >  }
> > >
> > > +/*
> > > + * From Sandybridge PRM, volume 2, "1.7.2 End-of-Pipe
> Synchronization":
> > > + *
> > > + *  Write synchronization is a special case of end-of-pipe
> > > + *  synchronization that requires that the render cache and/or depth
> > > + *  related caches are flushed to memory, where the data will become
> > > + *  globally visible. This type of synchronization is required prior
> to
> > > + *  SW (CPU) actually reading the result data from memory, or
> initiating
> > > + *  an operation that will use as a read surface (such as a texture
> > > + *  surface) a previous render target and/or depth/stencil buffer
> > > + *
> > > + *
> > > + * From Haswell PRM, volume 2, part 1, "End-of-Pipe Synchronization":
> > > + *
> > > + *  Exercising the write cache flush bits (Render Target Cache Flush
> > > + *  Enable, Depth Cache Flush Enable, DC Flush) in PIPE_CONTROL only
> > > + *  ensures the write caches are flushed and doesn't guarantee the
> data
> > > + *  is globally visible.
> > > + *
> > > + *  SW can track the completion of the end-of-pipe-synchronization by
> > > + *  using "Notify Enable" and "PostSync Operation - Write Immediate
> > > + *  Data" in the PIPE_CONTROL command.
> > > + */
> > > +void
> > > +brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags)
> > > +{
> > > +   if (brw->gen >= 6) {
> > > +      /* From Sandybridge PRM, volume 2, "1.7.3.1 Writing a Value to
> Memory":
> > > +       *
> > > +       *    "The most common action to perform upon reaching a
> synchronization
> > > +       *    point is to write a value out to memory. An immediate
> value
> > > +       *    (included with the synchronization command) may be
> written."
> > > +       *
> > > +       *
> > > +       * From Broadwell PRM, volume 7, "End-of-Pipe Synchronization":
> > > +       *
> > > +       *    "In case the data flushed out by the render engine is to
> be read
> > > +       *    back in to the render engine in coherent manner, then the
> render
> > > +       *    engine has to wait for the fence completion before
> accessing the
> > > +       *    flushed data. This can be achieved by following means on
> various
> > > +       *    products: PIPE_CONTROL command with CS Stall and the
> required
> > > +       *    write caches flushed with Post-Sync-Operation as Write
> Immediate
> > > +       *    Data.
> > > +       *
> > > +       *    Example:
> > > +       *       - Workload-1 (3D/GPGPU/MEDIA)
> > > +       *       - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write
> Immediate
> > > +       *         Data, Required Write Cache Flush bits set)
> > > +       *       - Workload-2 (Can use the data produce or output by
> Workload-1)
> > > +       */
> > > +      brw_emit_pipe_control_write(brw,
> > > +                                  flags | PIPE_CONTROL_CS_STALL |
> > > +                                  PIPE_CONTROL_WRITE_IMMEDIATE,
> > > +                                  brw->workaround_bo, 0, 0);
> > > +
> > > +      if (brw->is_haswell) {
> > > +         /* Haswell needs addition work-arounds:
> > > +          *
> > > +          * From Haswell PRM, volume 2, part 1, "End-of-Pipe
> Synchronization":
> > > +          *
> > > +          *    Option 1:
> > > +          *    PIPE_CONTROL command with the CS Stall and the
> required write
> > > +          *    caches flushed with Post-SyncOperation as Write
> Immediate Data
> > > +          *    followed by eight dummy MI_STORE_DATA_IMM (write to
> scratch
> > > +          *    spce) commands.
> > > +          *
> > > +          *    Example:
> > > +          *       - Workload-1
> > > +          *       - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write
> > > +          *         Immediate Data, Required Write Cache Flush bits
> set)
> > > +          *       - MI_STORE_DATA_IMM (8 times) (Dummy data, Scratch
> Address)
> > > +          *       - Workload-2 (Can use the data produce or output by
> > > +          *         Workload-1)
> > > +          *
> > > +          * Unfortunately, both the PRMs and the internal docs are a
> bit
> > > +          * out-of-date in this regard.  What the windows driver does
> (and
> > > +          * this appears to actually work) is to emit a register read
> from the
> > > +          * memory address written by the pipe control above.
> > > +          *
> > > +          * What register we load into doesn't matter.  We choose an
> indirect
> > > +          * rendering register because we know it always exists and
> doesn't
> > > +          * require command parser support.  This is perfectly safe
> to do
> > > +          * since we always re-load all of the indirect draw
> registers right
> > > +          * before 3DPRIMITIVE when needed anyway.
> > > +          */
> > > +         brw_load_register_mem(brw, GEN7_3DPRIM_START_INSTANCE,
> > > +                               brw->workaround_bo,
> > > +                               I915_GEM_DOMAIN_INSTRUCTION, 0, 0);
> >
> > This does require command parser support.  Your MI_LOAD_REGISTER_MEM
> will be
> > converted to MI_NOOP if the command parser is not enabled.  That means
> that
> > the workaround effectively won't happen for users with kernels older
> than 4.2
> > (specifically kernel sha 245054a1fe33c06ad233e0d58a27ec7b64db9284).
> >
> > ChromeOS in particular was running a 3.8 based kernel, with no command
> > parser, last I checked, which means they won't get this bug fix.
> Applying
> > a kernel patch may actually be a faster route to getting this bug fixed,
> > for them, at least.  For normal Linux distro users, this Mesa patch is
> > probably the fastest approach.  ChromeOS should update their kernel to
> 4.4+
> > anyway.
>
> The kernel does have a LRI after a flush before signaling the batch is
> complete. I don't see a need to add another...
>
> The question is whether this posting is required for GPU visibility of
> results or just CPU? I suspect this is just for CPU in which case it
> doesn't belong here at all, but before flagging rendering as ready for
> async (i.e. not involving the kernel) inspection.
>

The docs, if you choose to believe them, seem to indicate that this is
needed for GPU visibility as well as CPU.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170615/21754ad4/attachment-0001.html>