[Mesa-dev] [PATCH v2 09/20] i965/fs: indirect addressing with doubles is not supported in IVB/BYT

Fri Feb 10 18:10:20 UTC 2017

Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:

> On Thu, 2017-02-09 at 12:18 -0800, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>> 
>> > It is tested empirically that IVB/BYT don't support indirect
>> > addressing
>> > with doubles but it is not documented in the PRM.
>> > 
>> > This patch applies the same solution than for Cherryview/Broxton
>> > and
>> > takes into account that we cannot double the stride, since the
>> > hardware will do it internally.
>> > 
>> > v2:
>> > - Fix assert to take into account Indirect DF MOVs in IVB and HSW.
>> > 
>> > Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 27
>> > ++++++++++++++++++--------
>> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp       | 11 ++++++-----
>> >  2 files changed, 25 insertions(+), 13 deletions(-)
>> > 
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> > index 487f2e90224..dd6cbab773c 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> > @@ -418,18 +418,29 @@ fs_generator::generate_mov_indirect(fs_inst
>> > *inst,
>> >        brw_MOV(p, dst, reg);
>> >     } else {
>> >        /* Prior to Broadwell, there are only 8 address registers.
>> > */
>> > -      assert(inst->exec_size == 8 || devinfo->gen >= 8);
>> > +      assert(inst->exec_size == 8 || devinfo->gen >= 8 ||
>> > +             (devinfo->gen == 7 && inst->exec_size < 8 &&
>> > +              (type_sz(reg.type) == 8 || type_sz(dst.type) ==
>> > 8)));
>> >  
>> >        /* We use VxH indirect addressing, clobbering a0.0 through
>> > a0.7. */
>> >        struct brw_reg addr = vec8(brw_address_reg(0));
>> >  
>> > -      /* The destination stride of an instruction (in bytes) must
>> > be greater
>> > -       * than or equal to the size of the rest of the
>> > instruction.  Since the
>> > -       * address register is of type UW, we can't use a D-type
>> > instruction.
>> > -       * In order to get around this, re retype to UW and use a
>> > stride.
>> > -       */
>> > -      indirect_byte_offset =
>> > -         retype(spread(indirect_byte_offset, 2),
>> > BRW_REGISTER_TYPE_UW);
>> > +      if (devinfo->gen != 7 || devinfo->is_haswell ||
>> > type_sz(reg.type) != 8) {
>> > +         /* The destination stride of an instruction (in bytes)
>> > must be greater
>> > +          * than or equal to the size of the rest of the
>> > instruction.  Since the
>> > +          * address register is of type UW, we can't use a D-type
>> > instruction.
>> > +          * In order to get around this, re retype to UW and use a
>> > stride.
>> > +          */
>> > +         indirect_byte_offset =
>> > +            retype(spread(indirect_byte_offset, 2),
>> > BRW_REGISTER_TYPE_UW);
>> > +      } else {
>> > +         /* In Ivybridge/Baytrail, when it operates with DF
>> > operands, we
>> > +          * cannot double the stride, since the hardware will do
>> > it
>> > +          * internally. Tested empirically.
>> > +          */
>> > +         indirect_byte_offset =
>> > +            retype(indirect_byte_offset, BRW_REGISTER_TYPE_UW);
>> > +      }
>> 
>> This doesn't make any sense, indirect_byte_offset is a 32-bit integer
>> type regardless of the type of the data you're copying, the hardware
>> won't do any stride doubling for you here.
>> 
>
> Yeah, I realised this was wrong when looking at the bug mentioned by
> Matt. Hence my work to provide a proper solution, which we discussed
> privately.
>
> I've verified on IVB hardware that indirect addressing works for 64-
> bit
>> moves, but getting it to do what you want is a bit trickier than on
>> more
>> recent hardware -- I'll reply with a patch that gets it working.
>> 
>
> OK
>
>> >  
>> >        /* There are a number of reasons why we don't use the base
>> > offset here.
>> >         * One reason is that the field is only 9 bits which means
>> > we can only
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > index 8f745dff440..85f43b7b144 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > @@ -3822,17 +3822,18 @@ fs_visitor::nir_emit_intrinsic(const
>> > fs_builder &bld, nir_intrinsic_instr *instr
>> >              (instr->num_components - 1) * type_sz(dest.type);
>> >  
>> >           fs_reg indirect_chv_high_32bit;
>> > -         bool is_chv_bxt_64bit =
>> > -            (devinfo->is_cherryview || devinfo->is_broxton) &&
>> > -            type_sz(dest.type) == 8;
>> > -         if (is_chv_bxt_64bit) {
>> > +         bool is_ivb_byt_chv_bxt_64bit =
>> 
>> Let's use a more sensible name here like "supports_64bit_indirects"
>> and
>> mark as const...  Or just remove the variable altogether and fold
>> this
>> into the condition of the second if statement below.
>> 
>
> OK
>
>> > +            (devinfo->is_cherryview || devinfo->is_broxton ||
>> > +             devinfo->is_ivybridge || devinfo->is_baytrail) &&
>> > +           type_sz(dest.type) == 8;
>> > +         if (is_ivb_byt_chv_bxt_64bit) {
>> >              indirect_chv_high_32bit = vgrf(glsl_type::uint_type);
>> >              /* Calculate indirect address to read high 32 bits */
>> >              bld.ADD(indirect_chv_high_32bit, indirect,
>> > brw_imm_ud(4));
>> 
>> The ADD and temporary seem redundant, you could take into account the
>> additional 32bit offset by applying an immediate offset to the 'src'
>> region used below.
>> 
>> >           }
>> >  
>> >           for (unsigned j = 0; j < instr->num_components; j++) {
>> > -            if (!is_chv_bxt_64bit) {
>> > +            if (!is_ivb_byt_chv_bxt_64bit) {
>> 
>> The code below seems pretty broken...  You lower a 64-bit to 64-bit
>> indirect move into two 64-bit to UD type-converting moves, which is
>> unlikely to do what you want.
>>   You need to lower it into two raw 32-bit
>> to 32bit moves (e.g. by using subscript() in the source as well as on
>> the destination).  Please fix this (and CC the patch to mesa-stable)
>> regardless of my patch to get 64-bit MOV_INDIRECT working, CHV/BXT
>> are
>> unlikely to work as expected except by luck.
>> 
>
> This is likely working because in fs_generator::generate_mov_indirect()
>  the source of the MOV is retyped to destination's type (at this case
> an UD). So, at the end, there is no generation of a 64-bit to UD
> converting move.
>

Yeah, that likely prevented things from failing catastrophically [this
is what I call luck :P], but generate_mov_indirect() dropping the source
type on the floor likely qualifies as a bug in its own right...

> But you are right, this is wrong. I'll fix it.
>
> Thanks for the feedback!
>
> Sam
>
>> >                 bld.emit(SHADER_OPCODE_MOV_INDIRECT,
>> >                          offset(dest, bld, j), offset(src, bld, j),
>> >                          indirect, brw_imm_ud(read_size));
>> > -- 
>> > 2.11.0
>> > 
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev at lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170210/e60011c1/attachment.sig>