[Mesa-dev] [PATCH 13/15] i965/fs: Add support for MOV_INDIRECT on pre-Broadwell hardware
Jason Ekstrand
jason at jlekstrand.net
Tue Dec 15 18:32:14 PST 2015
On Dec 15, 2015 12:30 AM, "Abdiel Janulgue" <abdiel.janulgue at linux.intel.com>
wrote:
>
>
>
> On 12/10/2015 06:23 AM, Jason Ekstrand wrote:
> > While we're at it, we also add support for the possibility that the
> > indirect is, in fact, a constant. This shouldn't happen in the common
case
> > (if it does, that means NIR failed to constant-fold something), but it's
> > possible so we should handle it.
>
> Perhaps this should re-ordered before patch 3?
We could, but it really doesn't matter. No MOV_INDIRECT ever hits the
generator pre-BDW prior to patch 15. They get lowered away to pull constant
loads.
--Jason
> > ---
> > src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++
> > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 51
+++++++++++++++++++-------
> > 2 files changed, 42 insertions(+), 13 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index 9eaf8d0..a2ec03e 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -4424,6 +4424,10 @@ get_lowered_simd_width(const struct
brw_device_info *devinfo,
> > case SHADER_OPCODE_TYPED_SURFACE_WRITE_LOGICAL:
> > return 8;
> >
> > + case SHADER_OPCODE_MOV_INDIRECT:
> > + /* Prior to Broadwell, we only have 8 address subregisters */
> > + return devinfo->gen < 8 ? 8 : inst->exec_size;
> > +
> > default:
> > return inst->exec_size;
> > }
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > index d86eee1..7fa6d84 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > @@ -351,22 +351,47 @@ fs_generator::generate_mov_indirect(fs_inst *inst,
> >
> > unsigned imm_byte_offset = reg.nr * REG_SIZE + reg.subnr;
> >
> > - /* We use VxH indirect addressing, clobbering a0.0 through a0.7. */
> > - struct brw_reg addr = vec8(brw_address_reg(0));
> > + if (indirect_byte_offset.file == BRW_IMMEDIATE_VALUE) {
> > + imm_byte_offset += indirect_byte_offset.ud;
> >
> > - /* The destination stride of an instruction (in bytes) must be
greater
> > - * than or equal to the size of the rest of the instruction. Since
the
> > - * address register is of type UW, we can't use a D-type
instruction.
> > - * In order to get around this, re re-type to UW and use a stride.
> > - */
> > - indirect_byte_offset =
> > - retype(spread(indirect_byte_offset, 2), BRW_REGISTER_TYPE_UW);
> > + reg.nr = imm_byte_offset / REG_SIZE;
> > + reg.subnr = imm_byte_offset % REG_SIZE;
> > + brw_MOV(p, dst, reg);
> > + } else {
> > + /* Prior to Broadwell, there are only 8 address registers. */
> > + assert(inst->exec_size == 8 || devinfo->gen >= 8);
> >
> > - /* Prior to Broadwell, there are only 8 address registers. */
> > - assert(inst->exec_size == 8 || devinfo->gen >= 8);
> > + /* We use VxH indirect addressing, clobbering a0.0 through a0.7.
*/
> > + struct brw_reg addr = vec8(brw_address_reg(0));
> >
> > - brw_MOV(p, addr, indirect_byte_offset);
> > - brw_MOV(p, dst, retype(brw_VxH_indirect(0, imm_byte_offset),
dst.type));
> > + /* The destination stride of an instruction (in bytes) must be
greater
> > + * than or equal to the size of the rest of the instruction.
Since the
> > + * address register is of type UW, we can't use a D-type
instruction.
> > + * In order to get around this, re re-type to UW and use a
stride.
> > + */
> > + indirect_byte_offset =
> > + retype(spread(indirect_byte_offset, 2), BRW_REGISTER_TYPE_UW);
> > +
> > + if (devinfo->gen < 8) {
> > + /* Prior to broadwell, we have a restriction that the bottom
5 bits
> > + * of the base offset and the bottom 5 bits of the indirect
must add
> > + * to less than 32. In other words, the hardware needs to be
able to
> > + * add the bottom five bits of the two to get the subnumber
and add
> > + * the next 7 bits of each to get the actual register
number. Since
> > + * the indirect may cause us to cross a register boundary,
this makes
> > + * it almost useless. We could try and do something clever
where we
> > + * use a actual base offset if base_offset % 32 == 0 but that
would
> > + * mean we were generating different code depending on the
base
> > + * offset. Instead, for the sake of consistency, we'll just
do the
> > + * add ourselves.
> > + */
> > + brw_ADD(p, addr, indirect_byte_offset,
brw_imm_uw(imm_byte_offset));
> > + brw_MOV(p, dst, retype(brw_VxH_indirect(0, 0), dst.type));
> > + } else {
> > + brw_MOV(p, addr, indirect_byte_offset);
> > + brw_MOV(p, dst, retype(brw_VxH_indirect(0, imm_byte_offset),
dst.type));
> > + }
> > + }
> > }
> >
> > void
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20151215/0b730263/attachment.html>
More information about the mesa-dev
mailing list