[Mesa-dev] [PATCH 1/3] i965: Move down genX_upload_sbe in profiles.
Mathias Fröhlich
Mathias.Froehlich at gmx.net
Tue Jun 12 06:10:43 UTC 2018
Hi Chris,
thanks for looking into that!
On Saturday, 2 June 2018 09:58:14 CEST Chris Wilson wrote:
> Quoting Mathias.Froehlich at gmx.net (2018-05-17 07:38:26)
> > From: Mathias Fröhlich <mathias.froehlich at web.de>
> >
> > Avoid looping over all VARYING_SLOT_MAX urb_setup array
> > entries from genX_upload_sbe. Prepare an array indirection
> > to the active entries of urb_setup already in the compile
> > step. On upload only walk the active arrays.
> >
> > Signed-off-by: Mathias Fröhlich <Mathias.Froehlich at web.de>
> > ---
> > src/intel/compiler/brw_compiler.h | 7 +++++++
> > src/intel/compiler/brw_fs.cpp | 23 +++++++++++++++++++++++
> > src/intel/compiler/brw_fs.h | 2 ++
> > src/intel/compiler/brw_fs_visitor.cpp | 1 +
> > src/mesa/drivers/dri/i965/genX_state_upload.c | 7 +++----
> > 5 files changed, 36 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/intel/compiler/brw_compiler.h b/src/intel/compiler/brw_compiler.h
> > index 8b4e6fe2e2..a9df45e00d 100644
> > --- a/src/intel/compiler/brw_compiler.h
> > +++ b/src/intel/compiler/brw_compiler.h
> > @@ -743,6 +743,13 @@ struct brw_wm_prog_data {
> > * For varying slots that are not used by the FS, the value is -1.
> > */
> > int urb_setup[VARYING_SLOT_MAX];
> > + /**
> > + * Cache structure into the urb_setup array above that contains the
> > + * attribute numbers of active varyings out of urb_setup.
> > + * The actual count is stored in urb_setup_attribs_count.
> > + */
> > + int urb_setup_attribs[VARYING_SLOT_MAX];
> > + int urb_setup_attribs_count;
> > };
> >
> > struct brw_push_const_block {
> > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> > index b21996c168..6930414067 100644
> > --- a/src/intel/compiler/brw_fs.cpp
> > +++ b/src/intel/compiler/brw_fs.cpp
> > @@ -1507,6 +1507,25 @@ fs_visitor::assign_curb_setup()
> > this->first_non_payload_grf = payload.num_regs + prog_data->curb_read_length;
> > }
> >
> > +/*
> > + * Build up an array of indices into the urb_setup array that
> > + * references the active entries of the urb_setup array.
> > + * Used to accelerate walking the active entries of the urb_setup array
> > + * on each upload.
> > + */
> > +void
> > +brw_compute_urb_setup_index(struct brw_wm_prog_data *wm_prog_data)
> > +{
> > + int index = 0;
> > + for (int attr = 0; attr < VARYING_SLOT_MAX; attr++) {
> > + int input_index = wm_prog_data->urb_setup[attr];
> > + if (input_index < 0)
> > + continue;
> > + wm_prog_data->urb_setup_attribs[index++] = attr;
>
> So far the only consumer of this wants the input_index again.
> Does that change, or is it worth including both to avoid the trawl?
Hmm, I don't know too much about the internal requirements of hardware in this regard.
But one property of the current code is that the current code orders the
varying slots in urb_setup[] with ascending attribute index.
So if we collapse the urb_setup[] and urb_setup_index[] arrays into something like
struct {
uint8_t attrib;
uint8_t index;
} urb_setup[VARYING_SLOT_MAX;
uint8_t urb_setup_count;
do we need to sort that afterwards by attribute before we can use that
in genX_upload_sbe?
> Is uint8_t (with a STATIC_ASSERT) good enough?
Sure, I was catching up with the next declaration beside to stick
with the 'surrounding coding style'. That's changed here in a v2 version.
We could even reach an even smaller cache footprint by using a single uint64_t and
iterate that using u_bit_scan64(). But I received some general headwind lately from
someone who did not like these bitmask loops. So I apply the bitmask iteration only in
places where the bitmasks really provide a larger improvement than just a smaller
cache footprint.
What do you think?
best
Mathias
More information about the mesa-dev
mailing list