[Mesa-dev] [PATCH 4/7] i965: Add an end-of-pipe sync helper
Chris Wilson
chris at chris-wilson.co.uk
Thu Jun 15 11:11:13 UTC 2017
Quoting Kenneth Graunke (2017-06-14 21:41:56)
> On Tuesday, June 13, 2017 2:53:24 PM PDT Jason Ekstrand wrote:
> > From: Topi Pohjolainen <topi.pohjolainen at intel.com>
> >
> > v2 (Jason Ekstrand):
> > - Take a flags parameter to control the flushes
> > - Refactoring
> >
> > Signed-off-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
> > ---
> > src/mesa/drivers/dri/i965/brw_context.h | 1 +
> > src/mesa/drivers/dri/i965/brw_pipe_control.c | 96 +++++++++++++++++++++++++++-
> > 2 files changed, 96 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
> > index 7b9be8a..b137409 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > @@ -1641,6 +1641,7 @@ void brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags);
> > void brw_emit_pipe_control_write(struct brw_context *brw, uint32_t flags,
> > struct brw_bo *bo, uint32_t offset,
> > uint64_t imm);
> > +void brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags);
> > void brw_emit_mi_flush(struct brw_context *brw);
> > void brw_emit_post_sync_nonzero_flush(struct brw_context *brw);
> > void brw_emit_depth_stall_flushes(struct brw_context *brw);
> > diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> > index 39bb9c7..338e4fc 100644
> > --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
> > +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> > @@ -271,7 +271,6 @@ gen7_emit_cs_stall_flush(struct brw_context *brw)
> > brw->workaround_bo, 0, 0);
> > }
> >
> > -
> > /**
> > * Emits a PIPE_CONTROL with a non-zero post-sync operation, for
> > * implementing two workarounds on gen6. From section 1.4.7.1
> > @@ -320,6 +319,101 @@ brw_emit_post_sync_nonzero_flush(struct brw_context *brw)
> > brw->workaround_bo, 0, 0);
> > }
> >
> > +/*
> > + * From Sandybridge PRM, volume 2, "1.7.2 End-of-Pipe Synchronization":
> > + *
> > + * Write synchronization is a special case of end-of-pipe
> > + * synchronization that requires that the render cache and/or depth
> > + * related caches are flushed to memory, where the data will become
> > + * globally visible. This type of synchronization is required prior to
> > + * SW (CPU) actually reading the result data from memory, or initiating
> > + * an operation that will use as a read surface (such as a texture
> > + * surface) a previous render target and/or depth/stencil buffer
> > + *
> > + *
> > + * From Haswell PRM, volume 2, part 1, "End-of-Pipe Synchronization":
> > + *
> > + * Exercising the write cache flush bits (Render Target Cache Flush
> > + * Enable, Depth Cache Flush Enable, DC Flush) in PIPE_CONTROL only
> > + * ensures the write caches are flushed and doesn't guarantee the data
> > + * is globally visible.
> > + *
> > + * SW can track the completion of the end-of-pipe-synchronization by
> > + * using "Notify Enable" and "PostSync Operation - Write Immediate
> > + * Data" in the PIPE_CONTROL command.
> > + */
> > +void
> > +brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags)
> > +{
> > + if (brw->gen >= 6) {
> > + /* From Sandybridge PRM, volume 2, "1.7.3.1 Writing a Value to Memory":
> > + *
> > + * "The most common action to perform upon reaching a synchronization
> > + * point is to write a value out to memory. An immediate value
> > + * (included with the synchronization command) may be written."
> > + *
> > + *
> > + * From Broadwell PRM, volume 7, "End-of-Pipe Synchronization":
> > + *
> > + * "In case the data flushed out by the render engine is to be read
> > + * back in to the render engine in coherent manner, then the render
> > + * engine has to wait for the fence completion before accessing the
> > + * flushed data. This can be achieved by following means on various
> > + * products: PIPE_CONTROL command with CS Stall and the required
> > + * write caches flushed with Post-Sync-Operation as Write Immediate
> > + * Data.
> > + *
> > + * Example:
> > + * - Workload-1 (3D/GPGPU/MEDIA)
> > + * - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write Immediate
> > + * Data, Required Write Cache Flush bits set)
> > + * - Workload-2 (Can use the data produce or output by Workload-1)
> > + */
> > + brw_emit_pipe_control_write(brw,
> > + flags | PIPE_CONTROL_CS_STALL |
> > + PIPE_CONTROL_WRITE_IMMEDIATE,
> > + brw->workaround_bo, 0, 0);
> > +
> > + if (brw->is_haswell) {
> > + /* Haswell needs addition work-arounds:
> > + *
> > + * From Haswell PRM, volume 2, part 1, "End-of-Pipe Synchronization":
> > + *
> > + * Option 1:
> > + * PIPE_CONTROL command with the CS Stall and the required write
> > + * caches flushed with Post-SyncOperation as Write Immediate Data
> > + * followed by eight dummy MI_STORE_DATA_IMM (write to scratch
> > + * spce) commands.
> > + *
> > + * Example:
> > + * - Workload-1
> > + * - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write
> > + * Immediate Data, Required Write Cache Flush bits set)
> > + * - MI_STORE_DATA_IMM (8 times) (Dummy data, Scratch Address)
> > + * - Workload-2 (Can use the data produce or output by
> > + * Workload-1)
> > + *
> > + * Unfortunately, both the PRMs and the internal docs are a bit
> > + * out-of-date in this regard. What the windows driver does (and
> > + * this appears to actually work) is to emit a register read from the
> > + * memory address written by the pipe control above.
> > + *
> > + * What register we load into doesn't matter. We choose an indirect
> > + * rendering register because we know it always exists and doesn't
> > + * require command parser support. This is perfectly safe to do
> > + * since we always re-load all of the indirect draw registers right
> > + * before 3DPRIMITIVE when needed anyway.
> > + */
> > + brw_load_register_mem(brw, GEN7_3DPRIM_START_INSTANCE,
> > + brw->workaround_bo,
> > + I915_GEM_DOMAIN_INSTRUCTION, 0, 0);
>
> This does require command parser support. Your MI_LOAD_REGISTER_MEM will be
> converted to MI_NOOP if the command parser is not enabled. That means that
> the workaround effectively won't happen for users with kernels older than 4.2
> (specifically kernel sha 245054a1fe33c06ad233e0d58a27ec7b64db9284).
>
> ChromeOS in particular was running a 3.8 based kernel, with no command
> parser, last I checked, which means they won't get this bug fix. Applying
> a kernel patch may actually be a faster route to getting this bug fixed,
> for them, at least. For normal Linux distro users, this Mesa patch is
> probably the fastest approach. ChromeOS should update their kernel to 4.4+
> anyway.
The kernel does have a LRI after a flush before signaling the batch is
complete. I don't see a need to add another...
The question is whether this posting is required for GPU visibility of
results or just CPU? I suspect this is just for CPU in which case it
doesn't belong here at all, but before flagging rendering as ready for
async (i.e. not involving the kernel) inspection.
-Chris
More information about the mesa-dev
mailing list