[Mesa-dev] [PATCH v2 09/20] i965/fs: indirect addressing with doubles is not supported in IVB/BYT
Samuel Iglesias Gonsálvez
siglesias at igalia.com
Mon Feb 13 06:09:20 UTC 2017
On Fri, 2017-02-10 at 10:10 -0800, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>
> > On Thu, 2017-02-09 at 12:18 -0800, Francisco Jerez wrote:
> > > Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> > >
> > > > It is tested empirically that IVB/BYT don't support indirect
> > > > addressing
> > > > with doubles but it is not documented in the PRM.
> > > >
> > > > This patch applies the same solution than for
> > > > Cherryview/Broxton
> > > > and
> > > > takes into account that we cannot double the stride, since the
> > > > hardware will do it internally.
> > > >
> > > > v2:
> > > > - Fix assert to take into account Indirect DF MOVs in IVB and
> > > > HSW.
> > > >
> > > > Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
> > > > ---
> > > > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 27
> > > > ++++++++++++++++++--------
> > > > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 11 ++++++--
> > > > ---
> > > > 2 files changed, 25 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > index 487f2e90224..dd6cbab773c 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > @@ -418,18 +418,29 @@
> > > > fs_generator::generate_mov_indirect(fs_inst
> > > > *inst,
> > > > brw_MOV(p, dst, reg);
> > > > } else {
> > > > /* Prior to Broadwell, there are only 8 address
> > > > registers.
> > > > */
> > > > - assert(inst->exec_size == 8 || devinfo->gen >= 8);
> > > > + assert(inst->exec_size == 8 || devinfo->gen >= 8 ||
> > > > + (devinfo->gen == 7 && inst->exec_size < 8 &&
> > > > + (type_sz(reg.type) == 8 || type_sz(dst.type) ==
> > > > 8)));
> > > >
> > > > /* We use VxH indirect addressing, clobbering a0.0
> > > > through
> > > > a0.7. */
> > > > struct brw_reg addr = vec8(brw_address_reg(0));
> > > >
> > > > - /* The destination stride of an instruction (in bytes)
> > > > must
> > > > be greater
> > > > - * than or equal to the size of the rest of the
> > > > instruction. Since the
> > > > - * address register is of type UW, we can't use a D-type
> > > > instruction.
> > > > - * In order to get around this, re retype to UW and use
> > > > a
> > > > stride.
> > > > - */
> > > > - indirect_byte_offset =
> > > > - retype(spread(indirect_byte_offset, 2),
> > > > BRW_REGISTER_TYPE_UW);
> > > > + if (devinfo->gen != 7 || devinfo->is_haswell ||
> > > > type_sz(reg.type) != 8) {
> > > > + /* The destination stride of an instruction (in
> > > > bytes)
> > > > must be greater
> > > > + * than or equal to the size of the rest of the
> > > > instruction. Since the
> > > > + * address register is of type UW, we can't use a D-
> > > > type
> > > > instruction.
> > > > + * In order to get around this, re retype to UW and
> > > > use a
> > > > stride.
> > > > + */
> > > > + indirect_byte_offset =
> > > > + retype(spread(indirect_byte_offset, 2),
> > > > BRW_REGISTER_TYPE_UW);
> > > > + } else {
> > > > + /* In Ivybridge/Baytrail, when it operates with DF
> > > > operands, we
> > > > + * cannot double the stride, since the hardware will
> > > > do
> > > > it
> > > > + * internally. Tested empirically.
> > > > + */
> > > > + indirect_byte_offset =
> > > > + retype(indirect_byte_offset,
> > > > BRW_REGISTER_TYPE_UW);
> > > > + }
> > >
> > > This doesn't make any sense, indirect_byte_offset is a 32-bit
> > > integer
> > > type regardless of the type of the data you're copying, the
> > > hardware
> > > won't do any stride doubling for you here.
> > >
> >
> > Yeah, I realised this was wrong when looking at the bug mentioned
> > by
> > Matt. Hence my work to provide a proper solution, which we
> > discussed
> > privately.
> >
> > I've verified on IVB hardware that indirect addressing works for
> > 64-
> > bit
> > > moves, but getting it to do what you want is a bit trickier than
> > > on
> > > more
> > > recent hardware -- I'll reply with a patch that gets it working.
> > >
> >
> > OK
> >
> > > >
> > > > /* There are a number of reasons why we don't use the
> > > > base
> > > > offset here.
> > > > * One reason is that the field is only 9 bits which
> > > > means
> > > > we can only
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > > index 8f745dff440..85f43b7b144 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > > @@ -3822,17 +3822,18 @@ fs_visitor::nir_emit_intrinsic(const
> > > > fs_builder &bld, nir_intrinsic_instr *instr
> > > > (instr->num_components - 1) * type_sz(dest.type);
> > > >
> > > > fs_reg indirect_chv_high_32bit;
> > > > - bool is_chv_bxt_64bit =
> > > > - (devinfo->is_cherryview || devinfo->is_broxton) &&
> > > > - type_sz(dest.type) == 8;
> > > > - if (is_chv_bxt_64bit) {
> > > > + bool is_ivb_byt_chv_bxt_64bit =
> > >
> > > Let's use a more sensible name here like
> > > "supports_64bit_indirects"
> > > and
> > > mark as const... Or just remove the variable altogether and fold
> > > this
> > > into the condition of the second if statement below.
> > >
> >
> > OK
> >
> > > > + (devinfo->is_cherryview || devinfo->is_broxton ||
> > > > + devinfo->is_ivybridge || devinfo->is_baytrail) &&
> > > > + type_sz(dest.type) == 8;
> > > > + if (is_ivb_byt_chv_bxt_64bit) {
> > > > indirect_chv_high_32bit =
> > > > vgrf(glsl_type::uint_type);
> > > > /* Calculate indirect address to read high 32 bits
> > > > */
> > > > bld.ADD(indirect_chv_high_32bit, indirect,
> > > > brw_imm_ud(4));
> > >
> > > The ADD and temporary seem redundant, you could take into account
> > > the
> > > additional 32bit offset by applying an immediate offset to the
> > > 'src'
> > > region used below.
> > >
> > > > }
> > > >
> > > > for (unsigned j = 0; j < instr->num_components; j++)
> > > > {
> > > > - if (!is_chv_bxt_64bit) {
> > > > + if (!is_ivb_byt_chv_bxt_64bit) {
> > >
> > > The code below seems pretty broken... You lower a 64-bit to 64-
> > > bit
> > > indirect move into two 64-bit to UD type-converting moves, which
> > > is
> > > unlikely to do what you want.
> > > You need to lower it into two raw 32-bit
> > > to 32bit moves (e.g. by using subscript() in the source as well
> > > as on
> > > the destination). Please fix this (and CC the patch to mesa-
> > > stable)
> > > regardless of my patch to get 64-bit MOV_INDIRECT working,
> > > CHV/BXT
> > > are
> > > unlikely to work as expected except by luck.
> > >
> >
> > This is likely working because in
> > fs_generator::generate_mov_indirect()
> > the source of the MOV is retyped to destination's type (at this
> > case
> > an UD). So, at the end, there is no generation of a 64-bit to UD
> > converting move.
> >
>
> Yeah, that likely prevented things from failing catastrophically
> [this
> is what I call luck :P], but generate_mov_indirect() dropping the
> source
> type on the floor likely qualifies as a bug in its own right...
>
I agree. I am going to fix this too.
Sam
> > But you are right, this is wrong. I'll fix it.
> >
> > Thanks for the feedback!
> >
> > Sam
> >
> > > > bld.emit(SHADER_OPCODE_MOV_INDIRECT,
> > > > offset(dest, bld, j), offset(src, bld,
> > > > j),
> > > > indirect, brw_imm_ud(read_size));
> > > > --
> > > > 2.11.0
> > > >
> > > > _______________________________________________
> > > > mesa-dev mailing list
> > > > mesa-dev at lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170213/d857af5d/attachment.sig>
More information about the mesa-dev
mailing list