[Mesa-dev] [PATCH v3] i965: skip reading unused slots at the begining of the URB for the FS

Iago Toral itoral at igalia.com
Fri Sep 29 11:08:17 UTC 2017


On Fri, 2017-09-29 at 03:53 -0700, Kenneth Graunke wrote:
> On Friday, September 29, 2017 3:36:34 AM PDT Iago Toral Quiroga
> wrote:
> > We can start reading the URB at the first offset that contains
> > varyings
> > that are actually read in the URB. We still need to make sure that
> > we
> > read at least one varying to honor hardware requirements.
> > 
> > This helps alleviate a problem introduced with 99df02ca26f61 for
> > separate shader objects: without separate shader objects we assign
> > locations sequentially, however, since that commit we have changed
> > the
> > method for SSO so that the VUE slot assigned depends on the number
> > of
> > builtin slots plus the location assigned to the varying. This fixed
> > layout is intended to help SSO programs by avoiding on-the-fly
> > recompiles
> > when swapping out shaders, however, it also means that if a varying
> > uses
> > a large location number close to the maximum allowed by the SF/FS
> > units
> > (31), then the offset introduced by the number of builtin slots can
> > push
> > the location outside the range and trigger an assertion.
> > 
> > This problem is affecting at least the following CTS tests for
> > enhanced layouts:
> > 
> > KHR-GL45.enhanced_layouts.varying_array_components
> > KHR-GL45.enhanced_layouts.varying_array_locations
> > KHR-GL45.enhanced_layouts.varying_components
> > KHR-GL45.enhanced_layouts.varying_locations
> > 
> > which use SSO and the the location layout qualifier to select such
> > location numbers explicitly.
> > 
> > This change helps these tests because for SSO we always have to
> > include
> > things such as VARYING_SLOT_CLIP_DIST{0,1} even if the fragment
> > shader is
> > very unlikely to read them, so by doing this we free builtin slots
> > from
> > the fixed VUE layout and we avoid the tests to crash in this
> > scenario.
> > 
> > Of course, this is not a proper fix, we'd still run into problems
> > if someone
> > tries to use an explicit max location and read gl_ViewportIndex,
> > gl_LayerID or
> > gl_CullDistancein in the FS, but that would be a much less common
> > bug and we
> > can probably wait to see if anyone actually runs into that
> > situation in a real
> > world scenario before making the decision that more aggresive
> > changes are
> > required to support this without reverting 99df02ca26f61.
> > 
> > v2:
> > - Add a debug message when we skip clip distances (Ilia)
> > - we also need to account for this when we compute the urb setup
> >   for the fragment shader stage, so add a compiler util to compute
> >   the first slot that we need to read from the URB instead of
> >   replicating the logic in both places.
> > 
> > v3:
> > - Make the util more generic so it can account for all unused slots
> >   at the beginning of the URB, that will make it more useful (Ken).
> > - Drop the debug message, it was not what Ilia was asking for.
> > 
> > Suggested-by: Kenneth Graunke <kenneth at whitecape.org>
> > ---
> >  src/intel/compiler/brw_compiler.h             | 29
> > +++++++++++++++++++++++++++
> >  src/intel/compiler/brw_fs.cpp                 |  7 +++----
> >  src/mesa/drivers/dri/i965/genX_state_upload.c | 16 ++++++---------
> >  3 files changed, 38 insertions(+), 14 deletions(-)
> > 
> > diff --git a/src/intel/compiler/brw_compiler.h
> > b/src/intel/compiler/brw_compiler.h
> > index 6753a8daf0..038f3f9551 100644
> > --- a/src/intel/compiler/brw_compiler.h
> > +++ b/src/intel/compiler/brw_compiler.h
> > @@ -1222,6 +1222,35 @@ brw_stage_has_packed_dispatch(const struct
> > gen_device_info *devinfo,
> >     }
> >  }
> >  
> > +/**
> > + * Computes the first varying slot in the URB produced by the
> > previous stage
> > + * that is used in the next stage. We do this by testing the
> > varying slots in
> > + * the previous stage's vue map against the inputs read in the
> > next stage.
> > + *
> > + * Note that:
> > + *
> > + * - Each URB offset contains two varying slots and we can only
> > skip a
> > + *   full offset if both slots are unused, so the value we return
> > here is always
> > + *   rounded down to the closest multiple of two.
> > + *
> > + * - gl_Layer and gl_ViewportIndex don't have their own varying
> > slots, they are
> > + *   part of the vue header, so if these are read we can't skip
> > anything.
> > + */
> > +static inline int
> > +brw_compute_first_urb_slot_required(uint64_t inputs_read,
> > +                                    const struct brw_vue_map
> > *prev_stage_vue_map)
> > +{
> > +   if ((inputs_read & (VARYING_BIT_LAYER | VARYING_BIT_VIEWPORT))
> > == 0) {
> 
> You don't actually need to do this, they will be in slot 0, so the
> first
> iteration of the loop will pick them up. :)

Actually, we need this, there are regressions without this condition. 

The problem is that these two varyings correspond to varying slots 22
and 23, but we have them in the vue header, so slot 0, so we remove the
slots when we setup the vue map and we need to check for them
explicitly, see this code from brw_vue_map.c:

  /* gl_Layer and gl_ViewportIndex don't get their own varying slots --
they
    * are stored in the first VUE slot (VARYING_SLOT_PSIZ).
    */
   slots_valid &= ~(VARYING_BIT_LAYER | VARYING_BIT_VIEWPORT);

> With the extra condition dropped,
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
> 
> I don't know why I was making this way more complicated than it
> needed
> to be.  Originally, when I tried to do this, I was adding a prog_data
> field, and doing a whole bunch of changes all over...your approach of
> introducing a helper function and calling it in the relevant places
> works much better :)  Nice work.

Thanks, but I wasn't doing much better myself by trying to do this only
for clip distances really :)

Iago


More information about the mesa-dev mailing list