[Mesa-dev] [PATCH v3] i965: skip reading unused slots at the begining of the URB for the FS

Kenneth Graunke kenneth at whitecape.org
Fri Sep 29 10:53:53 UTC 2017


On Friday, September 29, 2017 3:36:34 AM PDT Iago Toral Quiroga wrote:
> We can start reading the URB at the first offset that contains varyings
> that are actually read in the URB. We still need to make sure that we
> read at least one varying to honor hardware requirements.
> 
> This helps alleviate a problem introduced with 99df02ca26f61 for
> separate shader objects: without separate shader objects we assign
> locations sequentially, however, since that commit we have changed the
> method for SSO so that the VUE slot assigned depends on the number of
> builtin slots plus the location assigned to the varying. This fixed
> layout is intended to help SSO programs by avoiding on-the-fly recompiles
> when swapping out shaders, however, it also means that if a varying uses
> a large location number close to the maximum allowed by the SF/FS units
> (31), then the offset introduced by the number of builtin slots can push
> the location outside the range and trigger an assertion.
> 
> This problem is affecting at least the following CTS tests for
> enhanced layouts:
> 
> KHR-GL45.enhanced_layouts.varying_array_components
> KHR-GL45.enhanced_layouts.varying_array_locations
> KHR-GL45.enhanced_layouts.varying_components
> KHR-GL45.enhanced_layouts.varying_locations
> 
> which use SSO and the the location layout qualifier to select such
> location numbers explicitly.
> 
> This change helps these tests because for SSO we always have to include
> things such as VARYING_SLOT_CLIP_DIST{0,1} even if the fragment shader is
> very unlikely to read them, so by doing this we free builtin slots from
> the fixed VUE layout and we avoid the tests to crash in this scenario.
> 
> Of course, this is not a proper fix, we'd still run into problems if someone
> tries to use an explicit max location and read gl_ViewportIndex, gl_LayerID or
> gl_CullDistancein in the FS, but that would be a much less common bug and we
> can probably wait to see if anyone actually runs into that situation in a real
> world scenario before making the decision that more aggresive changes are
> required to support this without reverting 99df02ca26f61.
> 
> v2:
> - Add a debug message when we skip clip distances (Ilia)
> - we also need to account for this when we compute the urb setup
>   for the fragment shader stage, so add a compiler util to compute
>   the first slot that we need to read from the URB instead of
>   replicating the logic in both places.
> 
> v3:
> - Make the util more generic so it can account for all unused slots
>   at the beginning of the URB, that will make it more useful (Ken).
> - Drop the debug message, it was not what Ilia was asking for.
> 
> Suggested-by: Kenneth Graunke <kenneth at whitecape.org>
> ---
>  src/intel/compiler/brw_compiler.h             | 29 +++++++++++++++++++++++++++
>  src/intel/compiler/brw_fs.cpp                 |  7 +++----
>  src/mesa/drivers/dri/i965/genX_state_upload.c | 16 ++++++---------
>  3 files changed, 38 insertions(+), 14 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_compiler.h b/src/intel/compiler/brw_compiler.h
> index 6753a8daf0..038f3f9551 100644
> --- a/src/intel/compiler/brw_compiler.h
> +++ b/src/intel/compiler/brw_compiler.h
> @@ -1222,6 +1222,35 @@ brw_stage_has_packed_dispatch(const struct gen_device_info *devinfo,
>     }
>  }
>  
> +/**
> + * Computes the first varying slot in the URB produced by the previous stage
> + * that is used in the next stage. We do this by testing the varying slots in
> + * the previous stage's vue map against the inputs read in the next stage.
> + *
> + * Note that:
> + *
> + * - Each URB offset contains two varying slots and we can only skip a
> + *   full offset if both slots are unused, so the value we return here is always
> + *   rounded down to the closest multiple of two.
> + *
> + * - gl_Layer and gl_ViewportIndex don't have their own varying slots, they are
> + *   part of the vue header, so if these are read we can't skip anything.
> + */
> +static inline int
> +brw_compute_first_urb_slot_required(uint64_t inputs_read,
> +                                    const struct brw_vue_map *prev_stage_vue_map)
> +{
> +   if ((inputs_read & (VARYING_BIT_LAYER | VARYING_BIT_VIEWPORT)) == 0) {

You don't actually need to do this, they will be in slot 0, so the first
iteration of the loop will pick them up. :)

With the extra condition dropped,
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

I don't know why I was making this way more complicated than it needed
to be.  Originally, when I tried to do this, I was adding a prog_data
field, and doing a whole bunch of changes all over...your approach of
introducing a helper function and calling it in the relevant places
works much better :)  Nice work.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170929/0f7a565f/attachment.sig>


More information about the mesa-dev mailing list