[Mesa-dev] [PATCH v2] i965/fs: force pull model for 64-bit GS inputs
Iago Toral
itoral at igalia.com
Thu Sep 28 08:14:25 UTC 2017
On Thu, 2017-09-28 at 00:59 -0700, Kenneth Graunke wrote:
> On Thursday, September 28, 2017 12:36:27 AM PDT Iago Toral Quiroga
> wrote:
> > Triggering the push model when 64-bit inputs are involved is not
> > easy due to
> > the constrains on the maximum number of registers that we allow for
> > this mode,
> > however, for GS with 'points' primitive type and just a couple of
> > double
> > varyings we can trigger this and it just doesn't work because the
> > implementation is not 64-bit aware at all. For now, let's make sure
> > that we
> > don't attempt this model whith 64-bit inputs and we always fall
> > back to pull
> > model for them.
> >
> > Also, don't enable the VUE handles in the thread payload on the fly
> > when we
> > find an input for which we need the pull model, this is not safe:
> > if we need
> > to resort to the pull model we need to account for that when we
> > setup the
> > thread payload so we compute the first non-payload register
> > properly. If we
> > didn't do that correctly and we enable it on-the-fly here then we
> > will end up
> > VUE handles on the first non-payload register which will probably
> > lead to
> > GPU hangs. Instead, always enable the VUE handles for the pull
> > model so we
> > can safely use them when needed. The GS is going to resort to pull
> > model
> > almost in every situation anyway, so this shouldn't make a
> > significant
> > difference and it makes things easier and safer.
> >
> > v2: Always enable the VUE handles for pull model, this is easier
> > and safer
> > and the GS is going to fallback to pull model almost always
> > anyway (Ken)
> > ---
> > src/intel/compiler/brw_fs.cpp | 30 +++++++++++++++++--------
> > -----
> > src/intel/compiler/brw_fs_nir.cpp | 4 +++-
> > 2 files changed, 20 insertions(+), 14 deletions(-)
> >
> > diff --git a/src/intel/compiler/brw_fs.cpp
> > b/src/intel/compiler/brw_fs.cpp
> > index eb9b4c38909..1517e87158b 100644
> > --- a/src/intel/compiler/brw_fs.cpp
> > +++ b/src/intel/compiler/brw_fs.cpp
> > @@ -5602,25 +5602,29 @@ fs_visitor::setup_gs_payload()
> > payload.num_regs++;
> > }
> >
> > + /* Always enable VUE handles so we can safely use pull model if
> > needed.
> > + *
> > + * The push model for a GS uses a ton of register space even
> > for trivial
> > + * scenarios with just a few inputs, so just make things easier
> > and a bit
> > + * safer by always having pull model available.
> > + */
> > + gs_prog_data->base.include_vue_handles = true;
> > +
> > + /* R3..RN: ICP Handles for each incoming vertex (when using
> > pull model) */
> > + payload.num_regs += nir->info.gs.vertices_in;
> > +
> > /* Use a maximum of 24 registers for push-model inputs. */
> > const unsigned max_push_components = 24;
> >
> > - /* If pushing our inputs would take too many registers, reduce
> > the URB read
> > - * length (which is in HWords, or 8 registers), and resort to
> > pulling.
> > + /* We want to use a maximum of 24 registers for push-model
> > inputs, so make
> > + * sure our URB read length (which is in HWords, or 8
> > registers) respects
> > + * that.
> > *
> > * Note that the GS reads <URB Read Length> HWords for every
> > vertex - so we
> > - * have to multiply by VerticesIn to obtain the total storage
> > requirement.
> > + * have to account for by VerticesIn to obtain the total
> > storage requirement.
> > */
> > - if (8 * vue_prog_data->urb_read_length * nir-
> > >info.gs.vertices_in >
> > - max_push_components || gs_prog_data->invocations > 1) {
> > - gs_prog_data->base.include_vue_handles = true;
> > -
> > - /* R3..RN: ICP Handles for each incoming vertex (when using
> > pull model) */
> > - payload.num_regs += nir->info.gs.vertices_in;
> > -
> > - vue_prog_data->urb_read_length =
> > - ROUND_DOWN_TO(max_push_components / nir-
> > >info.gs.vertices_in, 8) / 8;
> > - }
> > + vue_prog_data->urb_read_length =
> > + ROUND_DOWN_TO(max_push_components / nir-
> > >info.gs.vertices_in, 8) / 8;
> > }
>
> Hmm, won't this set urb_read_length to the maximum all the time?
>
> This code was meant to clamp an existing value to the max, if it
> exceeds that, but leave it less than that if that's all we needed...
Right, for some reason I convinced myself that we could just always use
the maximum, but you're right that doesn't make sense if we have less
than that, I don't know what I was thinking... sorry, I'll send a v3.
Iago
More information about the mesa-dev
mailing list