[Mesa-dev] [PATCH v2 28/42] i965/fs: Handle nir shared variable store intrinsic function

Jordan Justen jordan.l.justen at intel.com
Mon Nov 30 00:28:26 PST 2015


On 2015-11-25 03:07:42, Iago Toral wrote:
> On Tue, 2015-11-17 at 21:55 -0800, Jordan Justen wrote:
> > Signed-off-by: Jordan Justen <jordan.l.justen at intel.com>
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 54 ++++++++++++++++++++++++++++++++
> >  1 file changed, 54 insertions(+)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > index e9336fd..c8c6370 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > @@ -2330,6 +2330,60 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
> >        break;
> >     }
> >  
> > +   case nir_intrinsic_store_shared_indirect:
> > +      has_indirect = true;
> > +      /* fallthrough */
> > +   case nir_intrinsic_store_shared: {
> > +      assert(devinfo->gen >= 7);
> > +
> > +      /* Block index */
> > +      fs_reg surf_index;
> > +      unsigned index = BRW_SLM_SURFACE_INDEX;
> > +      surf_index = fs_reg(index);
> 
> We don't need the index variable here. Also, this needs to be rebased on
> top of Matt's changes, so you can just do:
> 
> fs_reg surf_index = brw_imm_ud(BRW_SLM_SURFACE_INDEX);
> 
> Also, you need to do the same in the previous patch.

Yeah, I had to fix this up after rebasing on Matt's patches.

> 
> > +      /* Offset */
> > +      fs_reg offset_reg = vgrf(glsl_type::uint_type);
> > +      unsigned const_offset_bytes = 0;
> > +      if (has_indirect) {
> > +         bld.MOV(offset_reg, get_nir_src(instr->src[1]));
> > +      } else {
> > +         const_offset_bytes = instr->const_index[0];
> > +         bld.MOV(offset_reg, fs_reg(const_offset_bytes));
> > +      }
> > +
> > +      /* Value */
> > +      fs_reg val_reg = get_nir_src(instr->src[0]);
> > +
> > +      /* Writemask */
> > +      unsigned writemask = instr->const_index[1];
> > +
> > +      /* Write each component present in the writemask */
> 
> The loop below is exactly the same I wrote in the initial implementation
> of ssbo stores, but Kristian optimized it later so we can group
> consecutive enabled channels in a single write message. See
> 0cb7d7b4b7c32246. I believe we should do the same here.

I have this implemented in my cs branch, but it was triggering the CTS
to be 'unable to find a register to spill'.

Hopefully based on your "Improve emitted code for copies of large
buffer-backed variables" series, I should be able to put this
optimization back. (And, I'll need to do the same for shared
variables...)

Thanks,

-Jordan

> 
> > +      unsigned skipped_channels = 0;
> > +      for (int i = 0; i < instr->num_components; i++) {
> > +         int component_mask = 1 << i;
> > +         if (writemask & component_mask) {
> > +            if (skipped_channels) {
> > +               if (!has_indirect) {
> > +                  const_offset_bytes += 4 * skipped_channels;
> > +                  bld.MOV(offset_reg, fs_reg(const_offset_bytes));
> > +               } else {
> > +                  bld.ADD(offset_reg, offset_reg,
> > +                           brw_imm_ud(4 * skipped_channels));
> > +               }
> > +               skipped_channels = 0;
> > +            }
> > +
> > +            emit_untyped_write(bld, surf_index, offset_reg,
> > +                               offset(val_reg, bld, i),
> > +                               1 /* dims */, 1 /* size */,
> > +                               BRW_PREDICATE_NONE);
> > +         }
> > +
> > +         skipped_channels++;
> > +      }
> > +      break;
> > +   }
> > +
> >     case nir_intrinsic_load_input_indirect:
> >        has_indirect = true;
> >        /* fallthrough */
> 
> 


More information about the mesa-dev mailing list