[Mesa-dev] [PATCH v4 20/40] intel/compiler: workaround for SIMD8 half-float MAD in gen8

Iago Toral itoral at igalia.com
Mon Feb 18 08:05:22 UTC 2019


On Sat, 2019-02-16 at 09:02 -0600, Jason Ekstrand wrote:
> On Tue, Feb 12, 2019 at 5:56 AM Iago Toral Quiroga <itoral at igalia.com
> > wrote:
> > Empirical testing shows that gen8 has a bug where MAD instructions
> > with
> > 
> > a half-float source starting at a non-zero offset fail to execute
> > 
> > properly.
> > 
> > 
> > 
> > This scenario usually happened in SIMD8 executions, where we used
> > to
> > 
> > pack vector components Y and W in the second half of SIMD registers
> > 
> > (therefore, with a 16B offset). It looks like we are not currently
> > doing
> > 
> > this any more but this would handle the situation properly if we
> > ever
> > 
> > happen to produce code like this again.
> > 
> > 
> > 
> > v2 (Jason):
> > 
> >  - Move this workaround to the lower_regioning pass as an
> > additional case
> > 
> >    to has_invalid_src_region()
> > 
> >  - Do not apply the workaround if the stride of the source operand
> > is 0,
> > 
> >    testing suggests the problem doesn't exist in that case.
> > 
> > 
> > 
> > Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com> (v1)
> > 
> > ---
> > 
> >  src/intel/compiler/brw_fs_lower_regioning.cpp | 39 +++++++++++++
> > ------
> > 
> >  1 file changed, 28 insertions(+), 11 deletions(-)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp
> > b/src/intel/compiler/brw_fs_lower_regioning.cpp
> > 
> > index df50993dee6..7c70cfab535 100644
> > 
> > --- a/src/intel/compiler/brw_fs_lower_regioning.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
> > 
> > @@ -109,20 +109,37 @@ namespace {
> > 
> >     has_invalid_src_region(const gen_device_info *devinfo, const
> > fs_inst *inst,
> > 
> >                            unsigned i)
> > 
> >     {
> > 
> > -      if (is_unordered(inst)) {
> > 
> > +      if (is_unordered(inst))
> > 
> >           return false;
> > 
> > -      } else {
> > 
> > -         const unsigned dst_byte_stride = inst->dst.stride *
> > type_sz(inst->dst.type);
> > 
> > -         const unsigned src_byte_stride = inst->src[i].stride *
> > 
> > -            type_sz(inst->src[i].type);
> > 
> > -         const unsigned dst_byte_offset = reg_offset(inst->dst) %
> > REG_SIZE;
> > 
> > -         const unsigned src_byte_offset = reg_offset(inst->src[i]) 
> > % REG_SIZE;
> > 
> > 
> > 
> > -         return has_dst_aligned_region_restriction(devinfo, inst)
> > &&
> > 
> > -                !is_uniform(inst->src[i]) &&
> > 
> > -                (src_byte_stride != dst_byte_stride ||
> > 
> > -                 src_byte_offset != dst_byte_offset);
> > 
> > +      /* Empirical testing shows that Broadwell has a bug
> > affecting half-float
> > 
> > +       * MAD instructions when any of its sources has a non-zero
> > offset, such
> > 
> > +       * as:
> > 
> > +       *
> > 
> > +       * mad(8) g18<1>HF -g17<4,4,1>HF g14.8<4,4,1>HF g11<4,4,1>HF
> > { align16 1Q };
> > 
> > +       *
> > 
> > +       * We used to generate code like this for SIMD8 executions
> > where we
> > 
> > +       * used to pack components Y and W of a vector at offset 16B
> > of a SIMD
> > 
> > +       * register. The problem doesn't occur if the stride of the
> > source is 0.
> > 
> > +       */
> > 
> > +      if (devinfo->gen == 8 &&
> > 
> > +          inst->opcode == BRW_OPCODE_MAD &&
> > 
> > +          inst->src[i].type == BRW_REGISTER_TYPE_HF &&
> > 
> > +          inst->src[i].offset > 0 &&
> > 
> > +          inst->src[i].stride != 0) {
> 
> The above assumes the register is a GRF.  Perhaps we should make this
> assumption explicit?  Or you can use some of curro's helpers and add
> another one to get the subreg offset.  Also, the real problem here
> isn't offset > 0, it's offset % REG_SIZE > 0.  If we have an array of
> 4 things, they'll be at offsets 0, 16, 32, and 48.  We don't want an
> offset of 32 triggering it.
> 

You're right. We already have a helper available that does what we
want, reg_offset() in brw_ir_fs.h. I have this now:
      if (devinfo->gen == 8 &&          inst->opcode == BRW_OPCODE_MAD
&&          inst->src[i].type == BRW_REGISTER_TYPE_HF
&&          reg_offset(inst->src[i]) % REG_SIZE > 0 &&          inst-
>src[i].stride != 0) {         return true;      }
> > +         return true;
> > 
> >        }
> > 
> > +
> > 
> > +      const unsigned dst_byte_stride = inst->dst.stride *
> > type_sz(inst->dst.type);
> > 
> > +      const unsigned src_byte_stride = inst->src[i].stride *
> > 
> > +         type_sz(inst->src[i].type);
> > 
> > +      const unsigned dst_byte_offset = reg_offset(inst->dst) %
> > REG_SIZE;
> > 
> > +      const unsigned src_byte_offset = reg_offset(inst->src[i]) %
> > REG_SIZE;
> > 
> > +
> > 
> > +      return has_dst_aligned_region_restriction(devinfo, inst) &&
> > 
> > +             !is_uniform(inst->src[i]) &&
> > 
> > +             (src_byte_stride != dst_byte_stride ||
> > 
> > +              src_byte_offset != dst_byte_offset);
> > 
> >     }
> > 
> > 
> > 
> >     /*
> > 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20190218/1672312c/attachment.html>


More information about the mesa-dev mailing list