[Mesa-dev] [PATCH v4 07/28] i965/fs: generalize the legalization d2x pass

Fri Mar 24 06:05:24 UTC 2017

On Thu, 2017-03-23 at 12:01 -0700, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> 
> > On Wed, 2017-03-22 at 15:47 -0700, Francisco Jerez wrote:
> > > Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> > > 
> > > > Generalize it to lower any unsupported narrower conversion.
> > > > 
> > > > v2 (Curro):
> > > > - Add supports_type_conversion()
> > > > - Reuse existing intruction instead of cloning it.
> > > > - Generalize d2x to narrower and equal size conversions.
> > > > 
> > > > Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
> > > > ---
> > > >  src/intel/compiler/brw_fs.cpp           | 11 ++--
> > > >  src/intel/compiler/brw_fs_lower_d2x.cpp | 97
> > > > ++++++++++++++++++++++++---------
> > > >  2 files changed, 77 insertions(+), 31 deletions(-)
> > > > 
> > > > diff --git a/src/intel/compiler/brw_fs.cpp
> > > > b/src/intel/compiler/brw_fs.cpp
> > > > index 6da23b17107..8612a83805d 100644
> > > > --- a/src/intel/compiler/brw_fs.cpp
> > > > +++ b/src/intel/compiler/brw_fs.cpp
> > > > @@ -5694,11 +5694,6 @@ fs_visitor::optimize()
> > > >        OPT(dead_code_eliminate);
> > > >     }
> > > >  
> > > > -   if (OPT(lower_d2x)) {
> > > > -      OPT(opt_copy_propagation);
> > > > -      OPT(dead_code_eliminate);
> > > > -   }
> > > > -
> > > >     OPT(lower_simd_width);
> > > >  
> > > >     /* After SIMD lowering just in case we had to unroll the
> > > > EOT
> > > > send. */
> > > > @@ -5745,6 +5740,12 @@ fs_visitor::optimize()
> > > >        OPT(dead_code_eliminate);
> > > >     }
> > > >  
> > > > +   if (OPT(lower_d2x)) {
> > > > +      OPT(opt_copy_propagation);
> > > > +      OPT(dead_code_eliminate);
> > > > +      OPT(lower_simd_width);
> > > > +   }
> > > > +
> > > >     lower_uniform_pull_constant_loads();
> > > >  
> > > >     validate();
> > > > diff --git a/src/intel/compiler/brw_fs_lower_d2x.cpp
> > > > b/src/intel/compiler/brw_fs_lower_d2x.cpp
> > > > index a2db1154615..fa25d313dcd 100644
> > > > --- a/src/intel/compiler/brw_fs_lower_d2x.cpp
> > > > +++ b/src/intel/compiler/brw_fs_lower_d2x.cpp
> > > > @@ -27,47 +27,92 @@
> > > >  
> > > >  using namespace brw;
> > > >  
> > > > +static bool
> > > > +supports_type_conversion(fs_inst *inst) {
> > > 
> > > Pointer should be marked const.
> > > 
> > > > +   switch(inst->opcode) {
> > > > +   case BRW_OPCODE_MOV:
> > > > +   case SHADER_OPCODE_MOV_INDIRECT:
> > > > +      return true;
> > > > +   case BRW_OPCODE_SEL:
> > > > +      return false;
> > > 
> > > I suggest you return 'inst->dst.type == get_exec_type_size(inst)'
> > > here
> > > in order to simplify the logic below and restructure things in a
> > > way
> > > that allows you to make opcode-specific decisions here based on
> > > the
> > > actual execution and destination types.
> > 
> > I guess you meant:
> >    'type_sz(inst->dst.type) == get_exec_type_size(inst)'
> > 
> 
> Uhm, not really, you really want to compare the destination type with
> the execution type of the instruction.
> 

I misunderstood you because of the name: it should be
get_exec_type(inst) (i.e. without _size).

I am going to create it.

Thanks,

Sam

> > This would make it simpler but we then allow f2d conversion which
> > is
> > not allowed for SEL.
> > 
> > > 
> > > > +   default:
> > > > +      /* FIXME: We assume the opcodes don't explicitly
> > > > mentioned
> > > > +       * before just work fine with arbitrary conversions.
> > > > +       */
> > > > +      return true;
> > > > +   }
> > > > +}
> > > > +
> > > >  bool
> > > >  fs_visitor::lower_d2x()
> > > >  {
> > > >     bool progress = false;
> > > >  
> > > >     foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
> > > > -      if (inst->opcode != BRW_OPCODE_MOV)
> > > > -         continue;
> > > > +      bool inst_support_conversion =
> > > > supports_type_conversion(inst);
> > > > +      bool supported_conversion =
> > > > +         inst_support_conversion &&
> > > > +         (get_exec_type_size(inst) != 8 ||
> > > > +          type_sz(inst->dst.type) > 4 ||
> > > > +          type_sz(inst->dst.type) >=
> > > > get_exec_type_size(inst));
> > > > 
> > > > -      if (inst->dst.type != BRW_REGISTER_TYPE_F &&
> > > > -          inst->dst.type != BRW_REGISTER_TYPE_D &&
> > > > -          inst->dst.type != BRW_REGISTER_TYPE_UD)
> > > > +      /* If the conversion is supported or there is no
> > > > conversion
> > > > then
> > > > +       * do nothing.
> > > > +       */
> > > > +      if (supported_conversion ||
> > > > +          (!inst_support_conversion && inst->dst.type == inst-
> > > > > src[0].type) ||
> > > 
> > > You should be using the execution type here (and below where you
> > > allocate the temp0 and temp1 vgrfs) instead of assuming it
> > > matches
> > > the
> > > type of the first source.  Also I'm not 100% certain that the
> > > combination of this conditional continue, the one below, the
> > > if/else
> > > branches below and the supported_conversion definitions above
> > > catch
> > > the
> > > exact set of cases where you need to apply lowering.  This would
> > > be a
> > > lot easier to follow (most of the above goes away) if you did
> > > something
> > > along the lines of:
> > > 
> > 
> > OK, thanks.
> > 
> > > > if (supports_type_conversion(inst)) {
> > > >    if (get_exec_type_size(inst) == 8 && type_sz(inst->dst.type) 
> > > > <
> > > > 8) {
> > > >       // Apply strided move workaround...
> > > >       progress = true;
> > > >    }
> > > > } else {
> > > >    // Apply general conversion workaround...
> > > >    progress = true;
> > > > }
> > > > +          inst->dst.file == BAD_FILE || inst->src[0].file ==
> > > > BAD_FILE)
> > > 
> > > I don't think the 'inst->dst.file == BAD_FILE || inst-
> > > >src[0].file ==
> > > BAD_FILE' terms of this expression are correct or useful, the
> > > 'inst->src[0].file' term because there might be a destination
> > > region
> > > conversion regardless of what the file of the first source is,
> > > the
> > > 'inst->dst.file == BAD_FILE' term because you could just rely on
> > > DCE
> > > instead...
> > > 
> > 
> > OK
> > 
> > > >           continue;
> > > >  
> > > > -      if (inst->src[0].type != BRW_REGISTER_TYPE_DF &&
> > > > -          inst->src[0].type != BRW_REGISTER_TYPE_UQ &&
> > > > -          inst->src[0].type != BRW_REGISTER_TYPE_Q)
> > > > -         continue;
> > > > +      /* This pass only supports conversion to narrower or
> > > > equal
> > > > size types. */
> > > > +      if (get_exec_type_size(inst) < type_sz(inst->dst.type))
> > > > +          continue;
> > > >  
> > > 
> > > I think you need to apply this condition to the if branch below
> > > that
> > > deals with DF to F conversions *only*, the else branch you
> > > probably
> > > need
> > > to apply regardless of whether the destination type is larger or
> > > smaller
> > > than the execution type.
> > > 
> > 
> > OK
> > 
> > > > -      assert(inst->dst.file == VGRF);
> > > >        assert(inst->saturate == false);
> > > 
> > > I don't see why it's safe to rely on saturate being false here,
> > > in
> > > any
> > > case it would be straightforward to handle it by setting the flag
> > > on
> > > the
> > > emitted MOV instruction and clearing it on the original
> > > instruction.
> > > 
> > 
> > OK
> > 
> > > > -      fs_reg dst = inst->dst;
> > > >  
> > > >        const fs_builder ibld(this, block, inst);
> > > > +      fs_reg dst = inst->dst;
> > > >  
> > > > -      /* From the Broadwell PRM, 3D Media GPGPU, "Double
> > > > Precision
> > > > Float to
> > > > -       * Single Precision Float":
> > > > -       *
> > > > -       *    The upper Dword of every Qword will be written
> > > > with
> > > > undefined
> > > > -       *    value when converting DF to F.
> > > > -       *
> > > > -       * So we need to allocate a temporary that's two
> > > > registers,
> > > > and then do
> > > > -       * a strided MOV to get the lower DWord of every Qword
> > > > that
> > > > has the
> > > > -       * result.
> > > > -       */
> > > > -      fs_reg temp = ibld.vgrf(inst->src[0].type, 1);
> > > > -      fs_reg strided_temp = subscript(temp, inst->dst.type,
> > > > 0);
> > > > -      ibld.MOV(strided_temp, inst->src[0]);
> > > > -      ibld.MOV(dst, strided_temp);
> > > > +      if (inst_support_conversion && !supported_conversion) {
> > > > +         /* From the Broadwell PRM, 3D Media GPGPU, "Double
> > > > Precision Float to
> > > > +          * Single Precision Float":
> > > > +          *
> > > > +          *    The upper Dword of every Qword will be written
> > > > with
> > > > undefined
> > > > +          *    value when converting DF to F.
> > > > +          *
> > > > +          * So we need to allocate a temporary that's two
> > > > registers, and then do
> > > > +          * a strided MOV to get the lower DWord of every
> > > > Qword
> > > > that has the
> > > > +          * result.
> > > > +          */
> > > > +         fs_reg temp = ibld.vgrf(inst->src[0].type, 1);
> > > 
> > > The '1' argument is redundant in all fs_builder::vgrf() calls
> > > here
> > > and
> > > below.
> > > 
> > 
> > OK
> > 
> > > > +         fs_reg strided_temp = subscript(temp, dst.type, 0);
> > > > +
> > > > +         /* We clone the original instruction as we are going
> > > > to
> > > > modify it
> > > > +          * and emit another one after it.
> > > > +          */
> > > 
> > > Comment seems stale.
> > > 
> > 
> > Right.
> > 
> > > > +         inst->dst = strided_temp;
> > > > +         /* As it is an strided destination, we write n-times
> > > > more
> > > > being n the
> > > > +          * size difference between source and destination
> > > > types.
> > > > Update
> > > 
> > > I guess by difference you meant ratio here and in the copy-pasted
> > > comment below (the comments also have a few minor spelling
> > > mistakes
> > > but
> > > that's unlikely to cause as much confusion).
> > > 
> > 
> > Right.
> > 
> > > > +          * size_written with the new destination.
> > > > +          */
> > > > +         inst->size_written = inst->dst.component_size(inst-
> > > > > exec_size);
> > > 
> > > I think I'd add an assert(inst->size_written ==
> > > inst->dst.component_size(inst->exec_size)) right before you
> > > assign
> > > 'inst->dst' to point at the temporary in order to make sure you
> > > aren't
> > > accidentally miscompiling instructions that write multiple
> > > components.
> > > 
> > 
> > OK
> > 
> > > > +         ibld.at(block, inst->next).MOV(dst, strided_temp);
> > > > +      } else {
> > > > +         fs_reg temp0 = ibld.vgrf(inst->src[0].type, 1);
> > > > +         fs_reg temp1 = ibld.vgrf(inst->src[0].type, 1);
> > > > +         fs_reg strided_temp1 = subscript(temp1, dst.type, 0);
> > > > +
> > > > +         inst->dst = temp0;
> > > > +         /* As it is an strided destination, we write n-times
> > > > more
> > > > being n the
> > > > +          * size difference between source and destination
> > > > types.
> > > > Update
> > > > +          * size_written with the new destination.
> > > > +          */
> > > > +         inst->size_written = inst->dst.component_size(inst-
> > > > > exec_size);
> > > > 
> > > >  
> > > > -      inst->remove(block);
> > > > +         /* Now, do the conversion to original destination's
> > > > type.
> > > > */
> > > > +         fs_inst *mov = ibld.at(block, inst-
> > > > > next).MOV(strided_temp1, temp0);
> > > > 
> > > > +         ibld.at(block, mov->next).MOV(dst, strided_temp1);
> > > 
> > > Isn't this MOV and the strided_temp1 handling redundant?  I'd
> > > expect
> > > it
> > > to be handled automatically during the next loop iteration of
> > > this
> > > pass
> > > (though you may have to switch back to foreach_block_and_inst()
> > > instead
> > > of foreach_block_and_inst_safe()), which would also result in
> > > better
> > > code because the extra move would only be emitted if this is an
> > > actual
> > > DF->F conversion.
> > > 
> > 
> > Good catch!
> > 
> > Thanks for all the suggestions, I am going to do them.
> > 
> > Sam
> > 
> > > > +      }
> > > >        progress = true;
> > > >     }
> > > >  
> > > > -- 
> > > > 2.11.0
> > > > 
> > > > _______________________________________________
> > > > mesa-dev mailing list
> > > > mesa-dev at lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170324/8c32a85d/attachment-0001.sig>