[Mesa-dev] [PATCH 13/15] i965/fs: Add support for MOV_INDIRECT on pre-Broadwell hardware

Jason Ekstrand jason at jlekstrand.net
Thu Apr 7 22:30:06 UTC 2016


On Thu, Apr 7, 2016 at 2:37 PM, Matt Turner <mattst88 at gmail.com> wrote:

> On Wed, Dec 9, 2015 at 8:23 PM, Jason Ekstrand <jason at jlekstrand.net>
> wrote:
> > While we're at it, we also add support for the possibility that the
> > indirect is, in fact, a constant.  This shouldn't happen in the common
> case
> > (if it does, that means NIR failed to constant-fold something), but it's
> > possible so we should handle it.
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.cpp           |  4 ++
> >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 51
> +++++++++++++++++++-------
> >  2 files changed, 42 insertions(+), 13 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index 9eaf8d0..a2ec03e 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -4424,6 +4424,10 @@ get_lowered_simd_width(const struct
> brw_device_info *devinfo,
> >     case SHADER_OPCODE_TYPED_SURFACE_WRITE_LOGICAL:
> >        return 8;
> >
> > +   case SHADER_OPCODE_MOV_INDIRECT:
> > +      /* Prior to Broadwell, we only have 8 address subregisters */
> > +      return devinfo->gen < 8 ? 8 : inst->exec_size;
>
> There are still only 16 on BDW+, would it make sense to change the
> last expression to MIN2(inst->exec_size, 16)?
>

For the sake of Curro and his SIMD32 efforts, yes.  Either that or he can
catch it in the rebase.  I don't care.


>
> > +
> >     default:
> >        return inst->exec_size;
> >     }
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > index d86eee1..7fa6d84 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > @@ -351,22 +351,47 @@ fs_generator::generate_mov_indirect(fs_inst *inst,
> >
> >     unsigned imm_byte_offset = reg.nr * REG_SIZE + reg.subnr;
> >
> > -   /* We use VxH indirect addressing, clobbering a0.0 through a0.7. */
> > -   struct brw_reg addr = vec8(brw_address_reg(0));
> > +   if (indirect_byte_offset.file == BRW_IMMEDIATE_VALUE) {
> > +      imm_byte_offset += indirect_byte_offset.ud;
> >
> > -   /* The destination stride of an instruction (in bytes) must be
> greater
> > -    * than or equal to the size of the rest of the instruction.  Since
> the
> > -    * address register is of type UW, we can't use a D-type instruction.
> > -    * In order to get around this, re re-type to UW and use a stride.
> > -    */
> > -   indirect_byte_offset =
> > -      retype(spread(indirect_byte_offset, 2), BRW_REGISTER_TYPE_UW);
> > +      reg.nr = imm_byte_offset / REG_SIZE;
> > +      reg.subnr = imm_byte_offset % REG_SIZE;
> > +      brw_MOV(p, dst, reg);
> > +   } else {
> > +      /* Prior to Broadwell, there are only 8 address registers. */
> > +      assert(inst->exec_size == 8 || devinfo->gen >= 8);
> >
> > -   /* Prior to Broadwell, there are only 8 address registers. */
> > -   assert(inst->exec_size == 8 || devinfo->gen >= 8);
> > +      /* We use VxH indirect addressing, clobbering a0.0 through a0.7.
> */
> > +      struct brw_reg addr = vec8(brw_address_reg(0));
> >
> > -   brw_MOV(p, addr, indirect_byte_offset);
> > -   brw_MOV(p, dst, retype(brw_VxH_indirect(0, imm_byte_offset),
> dst.type));
> > +      /* The destination stride of an instruction (in bytes) must be
> greater
> > +       * than or equal to the size of the rest of the instruction.
> Since the
> > +       * address register is of type UW, we can't use a D-type
> instruction.
> > +       * In order to get around this, re re-type to UW and use a stride.
>
> s/re re-type/retype/ while we're moving it.
>

Sure


> > +       */
> > +      indirect_byte_offset =
> > +         retype(spread(indirect_byte_offset, 2), BRW_REGISTER_TYPE_UW);
> > +
> > +      if (devinfo->gen < 8) {
> > +         /* Prior to broadwell, we have a restriction that the bottom 5
> bits
> > +          * of the base offset and the bottom 5 bits of the indirect
> must add
> > +          * to less than 32.  In other words, the hardware needs to be
> able to
> > +          * add the bottom five bits of the two to get the subnumber
> and add
> > +          * the next 7 bits of each to get the actual register number.
> Since
> > +          * the indirect may cause us to cross a register boundary,
> this makes
> > +          * it almost useless.  We could try and do something clever
> where we
> > +          * use a actual base offset if base_offset % 32 == 0 but that
> would
> > +          * mean we were generating different code depending on the base
> > +          * offset.  Instead, for the sake of consistency, we'll just
> do the
> > +          * add ourselves.
> > +          */
> > +         brw_ADD(p, addr, indirect_byte_offset,
> brw_imm_uw(imm_byte_offset));
> > +         brw_MOV(p, dst, retype(brw_VxH_indirect(0, 0), dst.type));
> > +      } else {
> > +         brw_MOV(p, addr, indirect_byte_offset);
> > +         brw_MOV(p, dst, retype(brw_VxH_indirect(0, imm_byte_offset),
> dst.type));
> > +      }
> > +   }
> >  }
> >
> >  void
> > --
> > 2.5.0.400.gff86faf
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160407/a8c5c664/attachment-0001.html>


More information about the mesa-dev mailing list