[Mesa-dev] [PATCH 1/3] i965: Move down genX_upload_sbe in profiles.

Mathias Fröhlich Mathias.Froehlich at gmx.net
Tue Jun 12 06:10:43 UTC 2018


Hi Chris,

thanks for looking into that!

On Saturday, 2 June 2018 09:58:14 CEST Chris Wilson wrote:
> Quoting Mathias.Froehlich at gmx.net (2018-05-17 07:38:26)
> > From: Mathias Fröhlich <mathias.froehlich at web.de>
> > 
> > Avoid looping over all VARYING_SLOT_MAX urb_setup array
> > entries from genX_upload_sbe. Prepare an array indirection
> > to the active entries of urb_setup already in the compile
> > step. On upload only walk the active arrays.
> > 
> > Signed-off-by: Mathias Fröhlich <Mathias.Froehlich at web.de>
> > ---
> >  src/intel/compiler/brw_compiler.h             |  7 +++++++
> >  src/intel/compiler/brw_fs.cpp                 | 23 +++++++++++++++++++++++
> >  src/intel/compiler/brw_fs.h                   |  2 ++
> >  src/intel/compiler/brw_fs_visitor.cpp         |  1 +
> >  src/mesa/drivers/dri/i965/genX_state_upload.c |  7 +++----
> >  5 files changed, 36 insertions(+), 4 deletions(-)
> > 
> > diff --git a/src/intel/compiler/brw_compiler.h b/src/intel/compiler/brw_compiler.h
> > index 8b4e6fe2e2..a9df45e00d 100644
> > --- a/src/intel/compiler/brw_compiler.h
> > +++ b/src/intel/compiler/brw_compiler.h
> > @@ -743,6 +743,13 @@ struct brw_wm_prog_data {
> >      * For varying slots that are not used by the FS, the value is -1.
> >      */
> >     int urb_setup[VARYING_SLOT_MAX];
> > +   /**
> > +    * Cache structure into the urb_setup array above that contains the
> > +    * attribute numbers of active varyings out of urb_setup.
> > +    * The actual count is stored in urb_setup_attribs_count.
> > +    */
> > +   int urb_setup_attribs[VARYING_SLOT_MAX];
> > +   int urb_setup_attribs_count;
> >  };
> >  
> >  struct brw_push_const_block {
> > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> > index b21996c168..6930414067 100644
> > --- a/src/intel/compiler/brw_fs.cpp
> > +++ b/src/intel/compiler/brw_fs.cpp
> > @@ -1507,6 +1507,25 @@ fs_visitor::assign_curb_setup()
> >     this->first_non_payload_grf = payload.num_regs + prog_data->curb_read_length;
> >  }
> >  
> > +/*
> > + * Build up an array of indices into the urb_setup array that
> > + * references the active entries of the urb_setup array.
> > + * Used to accelerate walking the active entries of the urb_setup array
> > + * on each upload.
> > + */
> > +void
> > +brw_compute_urb_setup_index(struct brw_wm_prog_data *wm_prog_data)
> > +{
> > +   int index = 0;
> > +   for (int attr = 0; attr < VARYING_SLOT_MAX; attr++) {
> > +      int input_index = wm_prog_data->urb_setup[attr];
> > +      if (input_index < 0)
> > +         continue;
> > +      wm_prog_data->urb_setup_attribs[index++] = attr;
> 
> So far the only consumer of this wants the input_index again.
> Does that change, or is it worth including both to avoid the trawl?

Hmm, I don't know too much about the internal requirements of hardware in this regard.
But one property of the current code is that the current code orders the
varying slots in urb_setup[] with ascending attribute index.
So if we collapse the urb_setup[] and urb_setup_index[] arrays into something like

struct {
   uint8_t attrib;
   uint8_t index;
}  urb_setup[VARYING_SLOT_MAX;
uint8_t urb_setup_count;

do we need to sort that afterwards by attribute before we can use that
in genX_upload_sbe?

> Is uint8_t (with a STATIC_ASSERT) good enough?
Sure, I was catching up with the next declaration beside to stick
with the 'surrounding coding style'. That's changed here in a v2 version.

We could even reach an even smaller cache footprint by using a single uint64_t and
iterate that using u_bit_scan64(). But I received some general headwind lately from
someone who did not like these bitmask loops. So I apply the bitmask iteration only in
places where the bitmasks really provide a larger improvement than just a smaller
cache footprint.
What do you think?

best

Mathias




More information about the mesa-dev mailing list