[Mesa-dev] [PATCH v2 1/3] i965/fs: fix indirect load DF uniforms on BSW/BXT

Samuel Iglesias Gonsálvez siglesias at igalia.com
Mon Feb 20 12:05:48 UTC 2017


On Mon, 2017-02-20 at 08:58 +0100, Samuel Iglesias Gonsálvez wrote:
> On Sat, 2017-02-18 at 18:58 -0800, Francisco Jerez wrote:
> > Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> > 
> > > The lowered BSW/BXT indirect move instructions had incorrect
> > > source types, which luckily wasn't causing incorrect assembly to
> > > be
> > > generated due to the bug fixed in the next patch, but would have
> > > confused the remaining back-end IR infrastructure due to the
> > > mismatch
> > > between the IR source types and the emitted machine code.
> > > 
> > > v2:
> > > - Improve commit log (Curro)
> > > - Fix read_size (Curro)
> > > - Fix DF uniform array detection in assign_constant_locations()
> > > when
> > >   it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT.
> > > 
> > > Signed-off-by: Samuel Iglesias Gonsálvez <siglesias at igalia.com>
> > > Cc: "17.0" <mesa-stable at lists.freedesktop.org>
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_fs.cpp     | 11 ++++++++-
> > >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 41 ++++++++++++++++--
> > > --------------
> > >  2 files changed, 30 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > index c348bc7138d..93ab84b5845 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > > @@ -1944,10 +1944,19 @@ fs_visitor::assign_constant_locations()
> > >              unsigned last = constant_nr + (inst->src[2].ud / 4)
> > > -
> > > 1;
> > >              assert(last < uniforms);
> > >  
> > > +            bool supports_64bit_indirects =
> > > +               !devinfo->is_cherryview && !devinfo->is_broxton;
> > > +            /* Detect if this is as a result 64-bit MOV
> > > INDIRECT.
> > > In case
> > > +             * of BSW/BXT, we substitute each DF MOV INDIRECT by
> > > two 32-bit MOV
> > > +             * INDIRECT.
> > > +             */
> > > +            bool mov_indirect_64bit = (type_sz(inst-
> > > >src[i].type)
> > > == 8) ||
> > > +               (!supports_64bit_indirects && inst->dst.type ==
> > > BRW_REGISTER_TYPE_UD &&
> > > +                inst->src[0].type == BRW_REGISTER_TYPE_UD &&
> > > inst-
> > > > dst.stride == 2);
> > 
> > This seems kind of fragile, I don't think the optimizer gives you
> > any
> > guarantees that the stride of a lowered 64-bit indirect move will
> > remain
> > equal to two, or that the destination stride of an actual 32-bit
> > indirect uniform load will never end up being two as well.  That
> > said,
> > because you access these with 32-bit indirect moves, I don't see
> > why
> > you'd need to treat them as 64-bit uniforms here, the usual
> > alignment
> > requirements for 64-bit uniforms no longer apply, so you can treat
> > them
> > as regular 32-bit uniforms AFAICT.  Why did you add this hunk?
> > 
> 
> I added it because of this case: if we access to one DF uniform array
> element with a normal MOV and the rest with MOV INDIRECT, we will
> mark
> the former as a live 64bit variable. Then we have the array scattered
> as part of it is uploaded as a 64-bits uniform and the other as 32-
> bits. Even if we avoid this by uploading everything together as 32-
> bits, then the access to that DF could not be aligned to 64-bits.
> 
> So my idea was to find a way to identify somehow those MOV INDIRECT
> in
> BSW to mark all the array as a 64-bit one.
> 

Mmm, maybe I can fix this case without the hack I did. I can add the
following code after marking all live variables accessed by the
instructions.

It is very similar to the one to detect live variables but it is fixing
the case where any MOV INDIRECT in BSW is accessing to an uniform array
of DF elements where one of these elements is directly accessed by
another instruction.

What do you think?

Sam

+   /* Detect if we accessed to an array of DF uniforms in both direct
and
+    * indirect ways on BSW/BXT. If this is the case, then we need
+    * to take into account in order to push all the elements together.
+    */
+   if (devinfo->is_cherryview || devinfo->is_broxton) {
+      foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
+         for (int i = 0 ; i < inst->sources; i++) {
+            if (inst->src[i].file != UNIFORM)
+               continue;
+
+            int constant_nr = inst->src[i].nr + inst->src[i].offset /
4;
+
+            if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0)
{
+               assert(inst->src[2].ud % 4 == 0);
+               unsigned last = constant_nr + (inst->src[2].ud / 4) -
1;
+               assert(last < uniforms);
+
+               bool is_df_array = false;
+               for (unsigned j = constant_nr; j < last; j++) {
+                  if (is_live_64bit[j]) {
+                     is_df_array = true;
+                     break;
+                  }
+               }
+
+               if (is_df_array) {
+                  /* If part of it is accessed with direct addressing
as a DF
+                   * then we mark the whole array as DF.
+                   */
+                  for (unsigned j = constant_nr; j < last; j++)
+                     is_live_64bit[j] = true;
+               }
+            }
+         }
+      }
+   }
+

> > Other than that patch looks good to me.
> > 
> 
> OK, thanks.
> 
> Sam
> 
> > >              for (unsigned j = constant_nr; j < last; j++) {
> > >                 is_live[j] = true;
> > >                 contiguous[j] = true;
> > > -               if (type_sz(inst->src[i].type) == 8) {
> > > +               if (mov_indirect_64bit) {
> > >                    is_live_64bit[j] = true;
> > >                 }
> > >              }
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > index 991c20fad62..4aaf84964e9 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > @@ -3878,31 +3878,30 @@ fs_visitor::nir_emit_intrinsic(const
> > > fs_builder &bld, nir_intrinsic_instr *instr
> > >           unsigned read_size = instr->const_index[1] -
> > >              (instr->num_components - 1) * type_sz(dest.type);
> > >  
> > > -         fs_reg indirect_chv_high_32bit;
> > > -         bool is_chv_bxt_64bit =
> > > -            (devinfo->is_cherryview || devinfo->is_broxton) &&
> > > -            type_sz(dest.type) == 8;
> > > -         if (is_chv_bxt_64bit) {
> > > -            indirect_chv_high_32bit =
> > > vgrf(glsl_type::uint_type);
> > > -            /* Calculate indirect address to read high 32 bits
> > > */
> > > -            bld.ADD(indirect_chv_high_32bit, indirect,
> > > brw_imm_ud(4));
> > > -         }
> > > +         bool supports_64bit_indirects =
> > > +            !devinfo->is_cherryview && !devinfo->is_broxton;
> > >  
> > > -         for (unsigned j = 0; j < instr->num_components; j++) {
> > > -            if (!is_chv_bxt_64bit) {
> > > +         if (type_sz(dest.type) != 8 ||
> > > supports_64bit_indirects)
> > > {
> > > +            for (unsigned j = 0; j < instr->num_components; j++)
> > > {
> > >                 bld.emit(SHADER_OPCODE_MOV_INDIRECT,
> > >                          offset(dest, bld, j), offset(src, bld,
> > > j),
> > >                          indirect, brw_imm_ud(read_size));
> > > -            } else {
> > > -               bld.emit(SHADER_OPCODE_MOV_INDIRECT,
> > > -                        subscript(offset(dest, bld, j),
> > > BRW_REGISTER_TYPE_UD, 0),
> > > -                        offset(src, bld, j),
> > > -                        indirect, brw_imm_ud(read_size));
> > > -
> > > -               bld.emit(SHADER_OPCODE_MOV_INDIRECT,
> > > -                        subscript(offset(dest, bld, j),
> > > BRW_REGISTER_TYPE_UD, 1),
> > > -                        offset(src, bld, j),
> > > -                        indirect_chv_high_32bit,
> > > brw_imm_ud(read_size));
> > > +            }
> > > +         } else {
> > > +            const unsigned num_mov_indirects =
> > > +               type_sz(dest.type) /
> > > type_sz(BRW_REGISTER_TYPE_UD);
> > > +            /* We read a little bit less per MOV INDIRECT, as
> > > they
> > > are now
> > > +             * 32-bits ones instead of 64-bit. Fix read_size
> > > then.
> > > +             */
> > > +            const unsigned read_size_32bit = read_size -
> > > +                (num_mov_indirects - 1) *
> > > type_sz(BRW_REGISTER_TYPE_UD);
> > > +            for (unsigned j = 0; j < instr->num_components; j++)
> > > {
> > > +               for (unsigned i = 0; i < num_mov_indirects; i++)
> > > {
> > > +                  bld.emit(SHADER_OPCODE_MOV_INDIRECT,
> > > +                           subscript(offset(dest, bld, j),
> > > BRW_REGISTER_TYPE_UD, i),
> > > +                           subscript(offset(src, bld, j),
> > > BRW_REGISTER_TYPE_UD, i),
> > > +                           indirect,
> > > brw_imm_ud(read_size_32bit));
> > > +               }
> > >              }
> > >           }
> > >        }
> > > -- 
> > > 2.11.0
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170220/b04b7c9d/attachment.sig>


More information about the mesa-dev mailing list