[Mesa-dev] [PATCH v2] i965: skip reading clip distances from the URB for the FS if possible

Kenneth Graunke kenneth at whitecape.org
Fri Sep 29 04:55:49 UTC 2017


On Thursday, September 28, 2017 3:33:12 AM PDT Iago Toral Quiroga wrote:
> we can skip these slots when they are not read in the fragment shader
> and they are positioned right after a VUE header that we are already
> skipping. We also need to ensure that we are passing at least one other
> varying, since that is a hardware requirement.
> 
> This helps alleviate a problem introduced with 99df02ca26f61 for
> separate shader objects: without separate shader objects we assign
> locations sequentially, however, since that commit we have changed the
> method for SSO so that the VUE slot assigned depends on the number of
> builtin slots plus the location assigned to the varying. This fixed
> layout is intended to help SSO programs by avoiding on-the-fly recompiles
> when swapping out shaders, however, it also means that if a varying uses
> a large location number close to the maximum allowed by the SF/FS units
> (31), then the offset introduced by the number of builtin slots can push
> the location outside the range and trigger an assertion.
> 
> This problem is affecting at least the following CTS tests for
> enhanced layouts:
> 
> KHR-GL45.enhanced_layouts.varying_array_components
> KHR-GL45.enhanced_layouts.varying_array_locations
> KHR-GL45.enhanced_layouts.varying_components
> KHR-GL45.enhanced_layouts.varying_locations
> 
> which use SSO and the the location layout qualifier to select such
> location numbers explicitly.
> 
> This change helps these tests because for SSO we always have to include
> VARYING_SLOT_CLIP_DIST{0,1} even if the fragment shader is very unlikely
> to read them, so by doing this we free builtin slots from the fixed VUE
> layout and we avoid the tests to crash in this scenario.
> 
> Of course, this is not a proper fix, we'd still run into problems if someone
> tries to use an explicit max location and read gl_ViewportIndex, gl_LayerID or
> gl_CullDistancein in the FS, but that would be a much less common bug and we
> can probably wait to see if anyone actually runs into that situation in a real
> world scenario before making the decision that more aggresive changes are
> required to support this without reverting 99df02ca26f61.
> 
> v2:
> - Add a debug message when we skip clip distances (Ilia)
> - we also need to account for this when we compute the urb setup
>   for the fragment shader stage, so add a compiler util to compute
>   the first slot that we need to read from the URB instead of
>   replicating the logic in both places.
> 
> Suggested-by: Kenneth Graunke <kenneth at whitecape.org>
> ---
>  src/intel/compiler/brw_compiler.h             | 50 +++++++++++++++++++++++++++
>  src/intel/compiler/brw_fs.cpp                 |  7 ++--
>  src/mesa/drivers/dri/i965/genX_state_upload.c | 16 ++++-----
>  3 files changed, 59 insertions(+), 14 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_compiler.h b/src/intel/compiler/brw_compiler.h
> index 6753a8daf0..411919d829 100644
> --- a/src/intel/compiler/brw_compiler.h
> +++ b/src/intel/compiler/brw_compiler.h
> @@ -25,6 +25,7 @@
>  #define BRW_COMPILER_H
>  
>  #include <stdio.h>
> +#include "common/gen_debug.h"
>  #include "common/gen_device_info.h"
>  #include "main/mtypes.h"
>  #include "main/macros.h"
> @@ -1222,6 +1223,55 @@ brw_stage_has_packed_dispatch(const struct gen_device_info *devinfo,
>     }
>  }
>  
> +static inline int
> +brw_compute_sf_first_urb_slot(uint64_t inputs_read,
> +                              const struct brw_vue_map *prev_stage_vue_map)
> +{
> +   /* If the fragment shader reads VARYING_SLOT_LAYER, then we need to pass in
> +    * the full vertex header.  Otherwise, we can program the SF to start
> +    * reading at an offset of 1 (2 varying slots) to skip unnecessary data:
> +    * - VARYING_SLOT_PSIZ and BRW_VARYING_SLOT_NDC on gen4-5
> +    * - VARYING_SLOT_{PSIZ,LAYER} and VARYING_SLOT_POS on gen6+
> +    */
> +   bool include_vue_header =
> +     inputs_read & (VARYING_BIT_LAYER | VARYING_BIT_VIEWPORT);
> +
> +   int first_slot =
> +      include_vue_header ? 0 : 2 * BRW_SF_URB_ENTRY_READ_OFFSET;
> +
> +   /* If we are already skipping the first 2 slots for the VUE header then we
> +    * can also skip clip distances if they are located right after the header
> +    * and the FS doesn't read them. Only do this is there are any user varyings
> +    * though, since the hardware requires that we pass at least one varying
> +    * between stages.
> +    */
> +   uint64_t var_bits = inputs_read & (~(VARYING_BIT_VAR(0) - 1));
> +   uint64_t clip_dist_bits = VARYING_BIT_CLIP_DIST0 | VARYING_BIT_CLIP_DIST1;
> +
> +   if (!include_vue_header && var_bits &&
> +       (inputs_read & clip_dist_bits) == 0 &&
> +       (prev_stage_vue_map->slots_valid & clip_dist_bits)) {
> +
> +      uint32_t clip_dist0_slot =
> +         prev_stage_vue_map->varying_to_slot[VARYING_SLOT_CLIP_DIST0];
> +
> +      uint32_t clip_dist1_slot =
> +         prev_stage_vue_map->varying_to_slot[VARYING_SLOT_CLIP_DIST1];
> +
> +      if (clip_dist0_slot >= 2 && clip_dist0_slot <= 3 &&
> +          clip_dist1_slot >= 2 && clip_dist1_slot <= 3) {
> +         first_slot += 2 * BRW_SF_URB_ENTRY_READ_OFFSET;

Do we actually really care about clip distance, specifically?  Couldn't
we basically just do:

   for (int i = 0; i < prev_stage_vue_map->num_slots; i++) {
      int varying = prev_stage_vue_map->slot_to_varying[i];
      if (varying > 0 && (inputs_read & BITFIELD64_BIT(varying)) != 0)
         return ROUND_DOWN_TO(i, 2);
   }

   return 0;

to find the first slot occupied by something, or the header slot if
there are no varyings read at all?  That would take care of skipping
over the header, clip distance, gl_Color, and any other junk slots.

> +
> +         if (unlikely(INTEL_DEBUG & DEBUG_URB)) {
> +            fprintf(stderr, "SF unit skips unused clip distance slots after "
> +                    "the VUE header.\n");
> +         }
> +      }
> +   }
> +
> +   return first_slot;
> +}
> +
>  #ifdef __cplusplus
>  } /* extern "C" */
>  #endif
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index b13b56cfba..5b444f7baa 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -1481,9 +1481,6 @@ fs_visitor::calculate_urb_setup()
>              }
>           }
>        } else {
> -         bool include_vue_header =
> -            nir->info.inputs_read & (VARYING_BIT_LAYER | VARYING_BIT_VIEWPORT);
> -
>           /* We have enough input varyings that the SF/SBE pipeline stage can't
>            * arbitrarily rearrange them to suit our whim; we have to put them
>            * in an order that matches the output of the previous pipeline stage
> @@ -1493,8 +1490,10 @@ fs_visitor::calculate_urb_setup()
>           brw_compute_vue_map(devinfo, &prev_stage_vue_map,
>                               key->input_slots_valid,
>                               nir->info.separate_shader);
> +
>           int first_slot =
> -            include_vue_header ? 0 : 2 * BRW_SF_URB_ENTRY_READ_OFFSET;
> +            brw_compute_sf_first_urb_slot(nir->info.inputs_read,
> +                                          &prev_stage_vue_map);
>  
>           assert(prev_stage_vue_map.num_slots <= first_slot + 32);
>           for (int slot = first_slot; slot < prev_stage_vue_map.num_slots;
> diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c b/src/mesa/drivers/dri/i965/genX_state_upload.c
> index 2a99376e3c..a3c1cf5166 100644
> --- a/src/mesa/drivers/dri/i965/genX_state_upload.c
> +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
> @@ -1029,17 +1029,13 @@ genX(calculate_attr_overrides)(const struct brw_context *brw,
>  
>     *point_sprite_enables = 0;
>  
> -   /* If the fragment shader reads VARYING_SLOT_LAYER, then we need to pass in
> -    * the full vertex header.  Otherwise, we can program the SF to start
> -    * reading at an offset of 1 (2 varying slots) to skip unnecessary data:
> -    * - VARYING_SLOT_PSIZ and BRW_VARYING_SLOT_NDC on gen4-5
> -    * - VARYING_SLOT_{PSIZ,LAYER} and VARYING_SLOT_POS on gen6+
> -    */
> -
> -   bool fs_needs_vue_header = fp->info.inputs_read &
> -      (VARYING_BIT_LAYER | VARYING_BIT_VIEWPORT);
> +   int first_slot =
> +      brw_compute_sf_first_urb_slot(fp->info.inputs_read,
> +                                    &brw->vue_map_geom_out);
>  
> -   *urb_entry_read_offset = fs_needs_vue_header ? 0 : 1;
> +   /* There are 2 slots in each offset */
> +   assert(first_slot % 2 == 0);
> +   *urb_entry_read_offset = first_slot / 2;
>  
>     /* From the Ivybridge PRM, Vol 2 Part 1, 3DSTATE_SBE,
>      * description of dw10 Point Sprite Texture Coordinate Enable:
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170928/f29db594/attachment.sig>


More information about the mesa-dev mailing list