[Mesa-dev] [PATCH v2 1/3] i965/fs: fix indirect load DF uniforms on BSW/BXT

Mon Feb 20 07:58:39 UTC 2017

On Sat, 2017-02-18 at 18:58 -0800, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> 
> > The lowered BSW/BXT indirect move instructions had incorrect
> > source types, which luckily wasn't causing incorrect assembly to be
> > generated due to the bug fixed in the next patch, but would have
> > confused the remaining back-end IR infrastructure due to the
> > mismatch
> > between the IR source types and the emitted machine code.
> > 
> > v2:
> > - Improve commit log (Curro)
> > - Fix read_size (Curro)
> > - Fix DF uniform array detection in assign_constant_locations()
> > when
> >   it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT.
> > 
> > Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
> > Cc: "17.0" <mesa-stable at lists.freedesktop.org>
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.cpp     | 11 ++++++++-
> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 41 ++++++++++++++++--
> > --------------
> >  2 files changed, 30 insertions(+), 22 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index c348bc7138d..93ab84b5845 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -1944,10 +1944,19 @@ fs_visitor::assign_constant_locations()
> >              unsigned last = constant_nr + (inst->src[2].ud / 4) -
> > 1;
> >              assert(last < uniforms);
> >  
> > +            bool supports_64bit_indirects =
> > +               !devinfo->is_cherryview && !devinfo->is_broxton;
> > +            /* Detect if this is as a result 64-bit MOV INDIRECT.
> > In case
> > +             * of BSW/BXT, we substitute each DF MOV INDIRECT by
> > two 32-bit MOV
> > +             * INDIRECT.
> > +             */
> > +            bool mov_indirect_64bit = (type_sz(inst->src[i].type)
> > == 8) ||
> > +               (!supports_64bit_indirects && inst->dst.type ==
> > BRW_REGISTER_TYPE_UD &&
> > +                inst->src[0].type == BRW_REGISTER_TYPE_UD && inst-
> > >dst.stride == 2);
> 
> This seems kind of fragile, I don't think the optimizer gives you any
> guarantees that the stride of a lowered 64-bit indirect move will
> remain
> equal to two, or that the destination stride of an actual 32-bit
> indirect uniform load will never end up being two as well.  That
> said,
> because you access these with 32-bit indirect moves, I don't see why
> you'd need to treat them as 64-bit uniforms here, the usual alignment
> requirements for 64-bit uniforms no longer apply, so you can treat
> them
> as regular 32-bit uniforms AFAICT.  Why did you add this hunk?
> 

I added it because of this case: if we access to one DF uniform array
element with a normal MOV and the rest with MOV INDIRECT, we will mark
the former as a live 64bit variable. Then we have the array scattered
as part of it is uploaded as a 64-bits uniform and the other as 32-
bits. Even if we avoid this by uploading everything together as 32-
bits, then the access to that DF could not be aligned to 64-bits.

So my idea was to find a way to identify somehow those MOV INDIRECT in
BSW to mark all the array as a 64-bit one.

> Other than that patch looks good to me.
> 

OK, thanks.

Sam

> >              for (unsigned j = constant_nr; j < last; j++) {
> >                 is_live[j] = true;
> >                 contiguous[j] = true;
> > -               if (type_sz(inst->src[i].type) == 8) {
> > +               if (mov_indirect_64bit) {
> >                    is_live_64bit[j] = true;
> >                 }
> >              }
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > index 991c20fad62..4aaf84964e9 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > @@ -3878,31 +3878,30 @@ fs_visitor::nir_emit_intrinsic(const
> > fs_builder &bld, nir_intrinsic_instr *instr
> >           unsigned read_size = instr->const_index[1] -
> >              (instr->num_components - 1) * type_sz(dest.type);
> >  
> > -         fs_reg indirect_chv_high_32bit;
> > -         bool is_chv_bxt_64bit =
> > -            (devinfo->is_cherryview || devinfo->is_broxton) &&
> > -            type_sz(dest.type) == 8;
> > -         if (is_chv_bxt_64bit) {
> > -            indirect_chv_high_32bit = vgrf(glsl_type::uint_type);
> > -            /* Calculate indirect address to read high 32 bits */
> > -            bld.ADD(indirect_chv_high_32bit, indirect,
> > brw_imm_ud(4));
> > -         }
> > +         bool supports_64bit_indirects =
> > +            !devinfo->is_cherryview && !devinfo->is_broxton;
> >  
> > -         for (unsigned j = 0; j < instr->num_components; j++) {
> > -            if (!is_chv_bxt_64bit) {
> > +         if (type_sz(dest.type) != 8 || supports_64bit_indirects)
> > {
> > +            for (unsigned j = 0; j < instr->num_components; j++) {
> >                 bld.emit(SHADER_OPCODE_MOV_INDIRECT,
> >                          offset(dest, bld, j), offset(src, bld, j),
> >                          indirect, brw_imm_ud(read_size));
> > -            } else {
> > -               bld.emit(SHADER_OPCODE_MOV_INDIRECT,
> > -                        subscript(offset(dest, bld, j),
> > BRW_REGISTER_TYPE_UD, 0),
> > -                        offset(src, bld, j),
> > -                        indirect, brw_imm_ud(read_size));
> > -
> > -               bld.emit(SHADER_OPCODE_MOV_INDIRECT,
> > -                        subscript(offset(dest, bld, j),
> > BRW_REGISTER_TYPE_UD, 1),
> > -                        offset(src, bld, j),
> > -                        indirect_chv_high_32bit,
> > brw_imm_ud(read_size));
> > +            }
> > +         } else {
> > +            const unsigned num_mov_indirects =
> > +               type_sz(dest.type) / type_sz(BRW_REGISTER_TYPE_UD);
> > +            /* We read a little bit less per MOV INDIRECT, as they
> > are now
> > +             * 32-bits ones instead of 64-bit. Fix read_size then.
> > +             */
> > +            const unsigned read_size_32bit = read_size -
> > +                (num_mov_indirects - 1) *
> > type_sz(BRW_REGISTER_TYPE_UD);
> > +            for (unsigned j = 0; j < instr->num_components; j++) {
> > +               for (unsigned i = 0; i < num_mov_indirects; i++) {
> > +                  bld.emit(SHADER_OPCODE_MOV_INDIRECT,
> > +                           subscript(offset(dest, bld, j),
> > BRW_REGISTER_TYPE_UD, i),
> > +                           subscript(offset(src, bld, j),
> > BRW_REGISTER_TYPE_UD, i),
> > +                           indirect, brw_imm_ud(read_size_32bit));
> > +               }
> >              }
> >           }
> >        }
> > -- 
> > 2.11.0
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170220/2737129b/attachment-0001.sig>