[Mesa-dev] [PATCH v3 3/3] i965/nir/fs: Implement new barrier functions for compute shaders

Fri Nov 6 13:44:14 PST 2015

On 2015-11-06 04:48:00, Francisco Jerez wrote:
> Jordan Justen <jordan.l.justen at intel.com> writes:
> 
> > For these nir intrinsics, we emit the same code as
> > nir_intrinsic_memory_barrier:
> >
> >  * nir_intrinsic_memory_barrier_atomic_counter
> >  * nir_intrinsic_memory_barrier_buffer
> >  * nir_intrinsic_memory_barrier_image
> >
> > We treat these nir intrinsics as no-ops:
> >  * nir_intrinsic_group_memory_barrier
> >  * nir_intrinsic_memory_barrier_shared
> >
> > v3:
> >  * Add comment for no-op cases (curro)
> >
> > Signed-off-by: Jordan Justen <jordan.l.justen at intel.com>
> > Cc: Francisco Jerez <currojerez at riseup.net>
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > index b6f4c52..3b3bc67 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > @@ -1697,6 +1697,9 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
> >        break;
> >     }
> >  
> > +   case nir_intrinsic_memory_barrier_atomic_counter:
> > +   case nir_intrinsic_memory_barrier_buffer:
> > +   case nir_intrinsic_memory_barrier_image:
> >     case nir_intrinsic_memory_barrier: {
> >        const fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_UD, 16 / dispatch_width);
> >        bld.emit(SHADER_OPCODE_MEMORY_FENCE, tmp)
> > @@ -1704,6 +1707,14 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
> >        break;
> >     }
> >  
> > +   case nir_intrinsic_group_memory_barrier:
> > +   case nir_intrinsic_memory_barrier_shared:
> > +      /* We treat these single workgroup level barriers as no-ops. This is
> > +       * safe today because we don't allow memory functions to be re-ordered
> > +       * and we only execute programs on a single sub-slice.
> > +       */
> 
> You forgot to mention the fault-and-stream thing, which is going to be a
> problem already on Gen9.5.  How about the following:
> 
> | /* We treat these workgroup-level barriers as no-ops.  This should be
> |  * safe at present and as long as:
> |  *
> |  *  - Memory access instructions are not subsequently reordered by the
> |  *    compiler back-end.
> |  *
> |  *  - All threads from a given compute shader workgroup fit within a
> |  *    single subslice and therefore talk to the same HDC shared unit
> |  *    what supposedly guarantees ordering and coherency between threads
> |  *    from the same workgroup.  This may change in the future when we
> |  *    start splitting workgroups across multiple subslices.
> |  *
> |  *  - The context is not in fault-and-stream mode, which could cause
> |  *    memory transactions (including to SLM) prior to the barrier to be
> |  *    replayed after the barrier if a pagefault occurs.  This shouldn't
> |  *    be a problem up to and including SKL because fault-and-stream is
> |  *    not usable due to hardware issues, but that's likely to change in
> |  *    the future.
> |  */
> 
> If you use that comment instead this patch is:
> 
> Reviewed-by: Francisco Jerez <currojerez at riseup.net>

Ok, but I think I'll move the comment to a separate patch authored by
you.

Thanks,

-Jordan

> 
> 
> > +      break;
> > +
> >     case nir_intrinsic_shader_clock: {
> >        /* We cannot do anything if there is an event, so ignore it for now */
> >        fs_reg shader_clock = get_timestamp(bld);
> > -- 
> > 2.6.2