[Mesa-dev] [PATCH] i965: skip reading clip distances from the URB for the FS if possible
Iago Toral Quiroga
itoral at igalia.com
Tue Sep 26 07:50:34 UTC 2017
we can skip these slots when they are not read in the fragment shader
and they are positioned right after a VUE header that we are already
skipping. We also need to ensure that we are passing at least one other
varying, since that is a hardware requirement.
This helps alleviate a problem introduced with 99df02ca26f61 for
separate shader objects: without separate shader objects we assign
locations sequentially, however, since that commit we have changed the
method for SSO so that the VUE slot assigned depends on the number of
builtin slots plus the location assigned to the varying. This fixed
layout is intended to help SSO programs by avoiding on-the-fly recompiles
when swapping out shaders, however, it also means that if a varying uses
a large location number close to the maximum allowed by the SF/FS units
(31), then the offset introduced by the number of builtin slots can push
the location outside the range and trigger an assertion.
This problem is affecting at least the following CTS tests for
enhanced layouts:
KHR-GL45.enhanced_layouts.varying_array_components
KHR-GL45.enhanced_layouts.varying_array_locations
KHR-GL45.enhanced_layouts.varying_components
KHR-GL45.enhanced_layouts.varying_locations
which use SSO and the the location layout qualifier to select such
location numbers explicitly.
This change helps these tests because for SSO we always have to include
VARYING_SLOT_CLIP_DIST{0,1} even if the fragment shader is very unlikely
to read them, so by doing this we free builtin slots from the fixed VUE
layout and we avoid the tests to crash in this scenario.
Of course, this is not a proper fix, we'd still run into problems if someone
tries to use an explicit max location and read gl_ViewportIndex, gl_LayerID or
gl_CullDistancein in the FS, but that would be a much less common bug and we
can probably wait to see if anyone actually runs into that situation in a real
world scenario before making the decision that more aggresive changes are
required to support this without reverting 99df02ca26f61.
Suggested-by: Kenneth Graunke <kenneth at whitecape.org>
---
src/mesa/drivers/dri/i965/genX_state_upload.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c b/src/mesa/drivers/dri/i965/genX_state_upload.c
index 612761601a..ce3515067f 100644
--- a/src/mesa/drivers/dri/i965/genX_state_upload.c
+++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
@@ -1040,6 +1040,31 @@ genX(calculate_attr_overrides)(const struct brw_context *brw,
*urb_entry_read_offset = fs_needs_vue_header ? 0 : 1;
+ /* If we are already skipping the first 2 slots for the VUE header then we
+ * can also skip clip distances if they are located right after the header
+ * and the FS doesn't read them. Only do this is there are any user varyings
+ * though, since the hardware requires that we pass at least one varying
+ * between stages.
+ */
+ uint64_t var_bits = ~(VARYING_BIT_VAR(0) - 1);
+ uint64_t clip_dist_slots = VARYING_BIT_CLIP_DIST0 | VARYING_BIT_CLIP_DIST1;
+ if (!fs_needs_vue_header &&
+ var_bits &&
+ (brw->fragment_program->info.inputs_read & clip_dist_slots) == 0 &&
+ (brw->vue_map_geom_out.slots_valid & clip_dist_slots)) {
+
+ uint32_t clip_dist0_slot =
+ brw->vue_map_geom_out.varying_to_slot[VARYING_SLOT_CLIP_DIST0];
+
+ uint32_t clip_dist1_slot =
+ brw->vue_map_geom_out.varying_to_slot[VARYING_SLOT_CLIP_DIST1];
+
+ if (clip_dist0_slot >= 2 && clip_dist0_slot <= 3 &&
+ clip_dist1_slot >= 2 && clip_dist1_slot <= 3) {
+ *urb_entry_read_offset = 2;
+ }
+ }
+
/* From the Ivybridge PRM, Vol 2 Part 1, 3DSTATE_SBE,
* description of dw10 Point Sprite Texture Coordinate Enable:
*
--
2.11.0
More information about the mesa-dev
mailing list