[Mesa-dev] [PATCH 5/6] intel/fs: Handle surface opcode sample masks via predication.

Kenneth Graunke kenneth at whitecape.org
Thu Mar 1 19:34:20 UTC 2018


On Tuesday, February 27, 2018 1:38:27 PM PST Francisco Jerez wrote:
> The main motivation is to enable HDC surface opcodes on ICL which no
> longer allows the sample mask to be provided in a message header, but
> this is enabled all the way back to IVB when possible because it
> decreases the instruction count of some shaders using HDC messages
> significantly, e.g. one of the SynMark2 CSDof compute shaders
> decreases instruction count by about 40% due to the removal of header
> setup boilerplate which in turn makes a number of send message
> payloads more easily CSE-able.  Shader-db results on SKL:
> 
>  total instructions in shared programs: 15325319 -> 15314384 (-0.07%)
>  instructions in affected programs: 311532 -> 300597 (-3.51%)
>  helped: 491
>  HURT: 1
> 
> Shader-db results on BDW where the optimization needs to be disabled
> in some cases due to hardware restrictions:
> 
>  total instructions in shared programs: 15604794 -> 15598028 (-0.04%)
>  instructions in affected programs: 220863 -> 214097 (-3.06%)
>  helped: 351
>  HURT: 0
> 
> The FPS of SynMark2 CSDof improves by 5.09% ±0.36% (n=10) on my SKL
> laptop with this change.
> ---
>  src/intel/compiler/brw_fs.cpp | 42 +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 0b87d8ab14e..639432b4f49 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -4432,6 +4432,8 @@ static void
>  lower_surface_logical_send(const fs_builder &bld, fs_inst *inst, opcode op,
>                             const fs_reg &sample_mask)
>  {
> +   const gen_device_info *devinfo = bld.shader->devinfo;
> +
>     /* Get the logical send arguments. */
>     const fs_reg &addr = inst->src[0];
>     const fs_reg &src = inst->src[1];
> @@ -4442,7 +4444,20 @@ lower_surface_logical_send(const fs_builder &bld, fs_inst *inst, opcode op,
>     /* Calculate the total number of components of the payload. */
>     const unsigned addr_sz = inst->components_read(0);
>     const unsigned src_sz = inst->components_read(1);
> -   const unsigned header_sz = (sample_mask.file == BAD_FILE ? 0 : 1);
> +   /* From the BDW PRM Volume 7, page 147:
> +    *
> +    *  "For the Data Cache Data Port*, the header must be present for the
> +    *   following message types: [...] Typed read/write/atomics"
> +    *
> +    * Earlier generations have a similar wording.  Because of this restriction
> +    * we don't attempt to implement sample masks via predication for such
> +    * messages prior to Gen9, since we have to provide a header anyway.  On
> +    * Gen11+ the header has been removed so we can only use predication.
> +    */
> +   const unsigned header_sz = devinfo->gen < 9 &&
> +                              (op == SHADER_OPCODE_TYPED_SURFACE_READ ||
> +                               op == SHADER_OPCODE_TYPED_SURFACE_WRITE ||
> +                               op == SHADER_OPCODE_TYPED_ATOMIC) ? 1 : 0;
>     const unsigned sz = header_sz + addr_sz + src_sz;
>  
>     /* Allocate space for the payload. */
> @@ -4462,6 +4477,31 @@ lower_surface_logical_send(const fs_builder &bld, fs_inst *inst, opcode op,
>  
>     bld.LOAD_PAYLOAD(payload, components, sz, header_sz);
>  
> +   /* Predicate the instruction on the sample mask if no header is
> +    * provided.
> +    */
> +   if (!header_sz && sample_mask.file != BAD_FILE &&
> +       sample_mask.file != IMM) {
> +      const fs_builder ubld = bld.group(1, 0).exec_all();
> +      if (inst->predicate) {
> +         assert(inst->predicate == BRW_PREDICATE_NORMAL);
> +         assert(!inst->predicate_inverse);
> +         assert(inst->flag_subreg < 2);
> +         /* Combine the sample mask with the existing predicate by using a
> +          * vertical predication mode.
> +           */
> +         inst->predicate = BRW_PREDICATE_ALIGN1_ALLV;
> +         ubld.MOV(retype(brw_flag_subreg(inst->flag_subreg + 2),
> +                         sample_mask.type),
> +                  sample_mask);

I was surprised to see flag_subreg remain unchanged here, but then I
re-read how allv works, and it does f0.0 & f1.0, or f0.1 & f1.1.  So,
we can leave it as 0 or 1 and it'll implicitly use 2 or 3 as well.

Series is:
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

> +      } else {
> +         inst->flag_subreg = 2;
> +         inst->predicate = BRW_PREDICATE_NORMAL;
> +         ubld.MOV(retype(brw_flag_subreg(inst->flag_subreg), sample_mask.type),
> +                  sample_mask);
> +      }
> +   }
> +
>     /* Update the original instruction. */
>     inst->opcode = op;
>     inst->mlen = header_sz + (addr_sz + src_sz) * inst->exec_size / 8;
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180301/774d80e9/attachment.sig>


More information about the mesa-dev mailing list