[Mesa-dev] [PATCH v3 3/3] i965/nir/fs: Implement new barrier functions for compute shaders

Fri Nov 6 04:48:00 PST 2015

Jordan Justen <jordan.l.justen at intel.com> writes:

> For these nir intrinsics, we emit the same code as
> nir_intrinsic_memory_barrier:
>
>  * nir_intrinsic_memory_barrier_atomic_counter
>  * nir_intrinsic_memory_barrier_buffer
>  * nir_intrinsic_memory_barrier_image
>
> We treat these nir intrinsics as no-ops:
>  * nir_intrinsic_group_memory_barrier
>  * nir_intrinsic_memory_barrier_shared
>
> v3:
>  * Add comment for no-op cases (curro)
>
> Signed-off-by: Jordan Justen <jordan.l.justen at intel.com>
> Cc: Francisco Jerez <currojerez at riseup.net>
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index b6f4c52..3b3bc67 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -1697,6 +1697,9 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
>        break;
>     }
>  
> +   case nir_intrinsic_memory_barrier_atomic_counter:
> +   case nir_intrinsic_memory_barrier_buffer:
> +   case nir_intrinsic_memory_barrier_image:
>     case nir_intrinsic_memory_barrier: {
>        const fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_UD, 16 / dispatch_width);
>        bld.emit(SHADER_OPCODE_MEMORY_FENCE, tmp)
> @@ -1704,6 +1707,14 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
>        break;
>     }
>  
> +   case nir_intrinsic_group_memory_barrier:
> +   case nir_intrinsic_memory_barrier_shared:
> +      /* We treat these single workgroup level barriers as no-ops. This is
> +       * safe today because we don't allow memory functions to be re-ordered
> +       * and we only execute programs on a single sub-slice.
> +       */

You forgot to mention the fault-and-stream thing, which is going to be a
problem already on Gen9.5.  How about the following:

| /* We treat these workgroup-level barriers as no-ops.  This should be
|  * safe at present and as long as:
|  *
|  *  - Memory access instructions are not subsequently reordered by the
|  *    compiler back-end.
|  *
|  *  - All threads from a given compute shader workgroup fit within a
|  *    single subslice and therefore talk to the same HDC shared unit
|  *    what supposedly guarantees ordering and coherency between threads
|  *    from the same workgroup.  This may change in the future when we
|  *    start splitting workgroups across multiple subslices.
|  *
|  *  - The context is not in fault-and-stream mode, which could cause
|  *    memory transactions (including to SLM) prior to the barrier to be
|  *    replayed after the barrier if a pagefault occurs.  This shouldn't
|  *    be a problem up to and including SKL because fault-and-stream is
|  *    not usable due to hardware issues, but that's likely to change in
|  *    the future.
|  */

If you use that comment instead this patch is:

Reviewed-by: Francisco Jerez <currojerez at riseup.net>

> +      break;
> +
>     case nir_intrinsic_shader_clock: {
>        /* We cannot do anything if there is an event, so ignore it for now */
>        fs_reg shader_clock = get_timestamp(bld);
> -- 
> 2.6.2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20151106/fdf86e10/attachment.sig>