[Mesa-dev] [PATCH v3] i965/fs: force pull model for 64-bit GS inputs

Fri Sep 29 06:20:47 UTC 2017

On Thu, 2017-09-28 at 21:19 -0700, Kenneth Graunke wrote:
> On Thursday, September 28, 2017 1:24:21 AM PDT Iago Toral Quiroga
> wrote:
> > Triggering the push model when 64-bit inputs are involved is not
> > easy due to
> > the constrains on the maximum number of registers that we allow for
> > this mode,
> > however, for GS with 'points' primitive type and just a couple of
> > double
> > varyings we can trigger this and it just doesn't work because the
> > implementation is not 64-bit aware at all. For now, let's make sure
> > that we
> > don't attempt this model whith 64-bit inputs and we always fall
> > back to pull
> > model for them.
> > 
> > Also, don't enable the VUE handles in the thread payload on the fly
> > when we
> > find an input for which we need the pull model, this is not safe:
> > if we need
> > to resort to the pull model we need to account for that when we
> > setup the
> > thread payload so we compute the first non-payload register
> > properly. If we
> > didn't do that correctly and we enable it on-the-fly here then we
> > will end up
> > VUE handles on the first non-payload register which will probably
> > lead to
> > GPU hangs. Instead, always enable the VUE handles for the pull
> > model so we
> > can safely use them when needed. The GS is going to resort to pull
> > model
> > almost in every situation anyway, so this shouldn't make a
> > significant
> > difference and it makes things easier and safer.
> > 
> > v2: Always enable the VUE handles for pull model, this is easier
> > and safer
> >     and the GS is going to fallback to pull model almost always
> > anyway (Ken)
> > 
> > v3: Only clamp the URB read length if we are over the maximum
> > reserved for
> >     push inputs as we were doing in the original code (Ken).
> > ---
> >  src/intel/compiler/brw_fs.cpp     | 16 +++++++++++-----
> >  src/intel/compiler/brw_fs_nir.cpp |  4 +++-
> >  2 files changed, 14 insertions(+), 6 deletions(-)
> > 
> > diff --git a/src/intel/compiler/brw_fs.cpp
> > b/src/intel/compiler/brw_fs.cpp
> > index eb9b4c3890..b13b56cfba 100644
> > --- a/src/intel/compiler/brw_fs.cpp
> > +++ b/src/intel/compiler/brw_fs.cpp
> > @@ -5602,6 +5602,17 @@ fs_visitor::setup_gs_payload()
> >        payload.num_regs++;
> >     }
> >  
> > +   /* Always enable VUE handles so we can safely use pull model if
> > needed.
> > +    *
> > +    * The push model for a GS uses a ton of register space even
> > for trivial
> > +    * scenarios with just a few inputs, so just make things easier
> > and a bit
> > +    * safer by always having pull model available.
> > +    */
> > +   gs_prog_data->base.include_vue_handles = true;
> > +
> > +   /* R3..RN: ICP Handles for each incoming vertex (when using
> > pull model) */
> > +   payload.num_regs += nir->info.gs.vertices_in;
> > +
> >     /* Use a maximum of 24 registers for push-model inputs. */
> >     const unsigned max_push_components = 24;
> >  
> > @@ -5613,11 +5624,6 @@ fs_visitor::setup_gs_payload()
> >      */
> >     if (8 * vue_prog_data->urb_read_length * nir-
> > >info.gs.vertices_in >
> >         max_push_components || gs_prog_data->invocations > 1) {
> 
> One last change...sorry!  It doesn't make much sense to clamp to the
> maximum number of push components just because invocations > 1.  We
> only
> needed to consider that in order to include the VUE handles, which we
> now do unconditionally.
> 
> Let's change this to:
> 
>      if (8 * vue_prog_data->urb_read_length * nir-
> >info.gs.vertices_in >
>          max_push_components) {
> 
> With that change,
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
> 
> Looking at this more closely has really shown all the ways I made a
> mess
> of this...so, thanks again for cleaning this all up!

No at all, thank you for all the feedback and reviews! :)

Iago

> > -      gs_prog_data->base.include_vue_handles = true;
> > -
> > -      /* R3..RN: ICP Handles for each incoming vertex (when using
> > pull model) */
> > -      payload.num_regs += nir->info.gs.vertices_in;
> > -
> >        vue_prog_data->urb_read_length =
> >           ROUND_DOWN_TO(max_push_components / nir-
> > >info.gs.vertices_in, 8) / 8;
> >     }
> > diff --git a/src/intel/compiler/brw_fs_nir.cpp
> > b/src/intel/compiler/brw_fs_nir.cpp
> > index d760946e62..aa57bb9d78 100644
> > --- a/src/intel/compiler/brw_fs_nir.cpp
> > +++ b/src/intel/compiler/brw_fs_nir.cpp
> > @@ -1915,7 +1915,9 @@ fs_visitor::emit_gs_input_load(const fs_reg
> > &dst,
> >     const unsigned push_reg_count = gs_prog_data-
> > >base.urb_read_length * 8;
> >  
> >     /* TODO: figure out push input layout for invocations == 1 */
> > +   /* TODO: make this work with 64-bit inputs */
> >     if (gs_prog_data->invocations == 1 &&
> > +       type_sz(dst.type) <= 4 &&
> >         offset_const != NULL && vertex_const != NULL &&
> >         4 * (base_offset + offset_const->u32[0]) < push_reg_count)
> > {
> >        int imm_offset = (base_offset + offset_const->u32[0]) * 4 +
> > @@ -1928,7 +1930,7 @@ fs_visitor::emit_gs_input_load(const fs_reg
> > &dst,
> >     }
> >  
> >     /* Resort to the pull model.  Ensure the VUE handles are
> > provided. */
> > -   gs_prog_data->base.include_vue_handles = true;
> > +   assert(gs_prog_data->base.include_vue_handles);
> >  
> >     unsigned first_icp_handle = gs_prog_data->include_primitive_id
> > ? 3 : 2;
> >     fs_reg icp_handle = bld.vgrf(BRW_REGISTER_TYPE_UD, 1);
> > 
> 
>