<div dir="ltr">On 2 February 2013 03:44, Kenneth Graunke <span dir="ltr"><<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The old calculation was off by one in certain cases. The documentation<br>
contains a precise formula for how to calculate it, so just use that.<br></blockquote><div><br></div><div>Wow, thank you for putting in the time and effort to track this down, Ken. I spent a fair amount of time this morning instrumenting the code to understand exactly what has changed, so that I could reassure myself that this is a true bug fix and not just a behavioural change that coincidentally makes the heisenbug disappear. I believe your fix is correct, and I believe I understand why the bug happened (I even have a speculation about how the hardware works that would explain why this fix is necessary).<br>
<br>Can you see if my understanding of the bug makes sense to you? It's a little different from your "off by one in certain cases" explanation, and I want to make sure that's not due to a misunderstanding on my part.<br>
<br><br>My understanding of the bug as of this morning:<br><br></div><div>Previous to this patch, we thought that the only restrictions on 3DSTATE_SF's URB read length were (a) it needs to be large enough to read all the VUE data that the SF needs, and (b) it can't be so large that it tries to read VUE data that doesn't exist. Since the VUE map already tells us how much VUE data exists, we didn't bother worrying about restriction (a); we just did the easy thing and programmed the read length to satisfy restriction (b).<br>
<br></div>However, we didn't notice this erratum in the hardware docs: "[errata] Corruption/Hang possible if length programmed larger than recommended". Judging by the context surrounding this erratum, it's pretty clear that it means "URB read length must be exactly the size necessary to read all the VUE data that the SF needs, and no larger". Which means that we can't program the read length based on restriction (b)--we have to program it based on restriction (a). Which is what your patch does.<br>
<br></div><div class="gmail_quote">So, while it's technically correct that the old calculation could be off by one in certain cases, it doesn't really capture the essence of the problem, because the old calculation was simply the wrong calculation. In some cases it was correct, and in other cases it was off by one or sometimes more. (In fact, in the test case Jordan isolated for bug 56920, I believe it was off by 2). The real explanation here is that the URB read size needs to precisely match the amount of data that the SF consumes; it doesn't work to simply base it on the size of the VUE.<br>
</div><div class="gmail_quote"><div><br></div><div><br>Speculation about why it's necessary for the URB read size to precisely match the amount of data that the SF consumes: maybe the only thing synchronizing the SF's URB reads with the rest of the work it does is a data dependency. In other words, maybe the only thing stopping the SF from skipping ahead to triangle B before triangle A's URB read completes is that it has to wait for the data from that URB read before it can finish processing triangle A. If we make the URB read size too big, we break that synchronization mechanism by causing the SF to receive extra data that it's not smart enough to wait around for. As a result, if we run a program where the FS ignores the VS's last varying slot(s), and follow it immediately by a program where the FS uses those slot(s), then there's a danger that one of the VUE reads from the old program will still be in flight when the new program starts, and it'll get misinterpreted as varying data for the first triangle of the new program. This would explain pretty much everything we've seen about this bug: why it only happens when the VS outputs extra varyings that the FS doesn't use, why bug 56920 is so sensitive to the ordering of the varyings, why bug 60172 bisects to a patch that changes varying order, why the bug manifests as corrupted data in a single vertex of the first triangle rendered by the second program, and finally, why doing a glFinish() between the two programs masks the bug--because it gives enough time for the extra URB read to complete before the second program starts.<br>
<br></div><div><br>Does this sound right to you? Does it jive with what you saw in bug 60172? If so it would be nice to expand on the commit message a bit. I don't think we need to include my speculation above, but at a minimum it would be nice to quote the erratum, and explain that as a consequence of it, we have to base the URB read length on the maximum VUE slot that is used by the SF rather than the maximum VUE slot that exists.<br>
<br><br></div><div>One other minor comment below.<br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Fixes random corruption in Steam's Big Picture Mode, random corruption<br>
in PlaneShift (since the varying reordering code landed), and Piglit<br>
test ________.<br>
<br>
NOTE: This is a candidate for all stable branches.<br>
<br>
(The code needs to get cleaned up - the final result is quite ugly - but<br>
I wanted to put it on the list for the people working on these bugs.)<br>
<br>
Bugzilla: <a href="https://bugs.freedesktop.org/show_bug.cgi?id=56920" target="_blank">https://bugs.freedesktop.org/show_bug.cgi?id=56920</a><br>
Bugzilla: <a href="https://bugs.freedesktop.org/show_bug.cgi?id=60172Signed-off-by" target="_blank">https://bugs.freedesktop.org/show_bug.cgi?id=60172<br>
Signed-off-by</a>: Kenneth Graunke <<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>><br>
---<br>
src/mesa/drivers/dri/i965/gen6_sf_state.c | 52 +++++++++++++++++++++----------<br>
1 file changed, 35 insertions(+), 17 deletions(-)<br>
<br>
I don't expect this to go upstream as is; it should get cleaned up first.<br>
<br>
We also should do the same thing for Ivybridge. Haven't tried that yet.<br>
<br>
diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c b/src/mesa/drivers/dri/i965/gen6_sf_state.c<br>
index c1bc252..83ff79c 100644<br>
--- a/src/mesa/drivers/dri/i965/gen6_sf_state.c<br>
+++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c<br>
@@ -113,7 +113,6 @@ upload_sf_state(struct brw_context *brw)<br>
{<br>
struct intel_context *intel = &brw->intel;<br>
struct gl_context *ctx = &intel->ctx;<br>
- uint32_t urb_entry_read_length;<br>
/* BRW_NEW_FRAGMENT_PROGRAM */<br>
uint32_t num_outputs = _mesa_bitcount_64(brw->fragment_program->Base.InputsRead);<br>
/* _NEW_LIGHT */<br>
@@ -125,26 +124,15 @@ upload_sf_state(struct brw_context *brw)<br>
bool multisampled_fbo = ctx->DrawBuffer->Visual.samples > 1;<br>
<br>
int attr = 0, input_index = 0;<br>
+ /* Skip over the first two entries in the VUE map (position and point size),<br>
+ * as they're passed through anyway and reading them is just overhead.<br>
+ */<br>
int urb_entry_read_offset = 1;<br>
float point_size;<br>
uint16_t attr_overrides[FRAG_ATTRIB_MAX];<br>
uint32_t point_sprite_origin;<br>
<br>
- /* CACHE_NEW_VS_PROG */<br>
- urb_entry_read_length = ((brw->vs.prog_data->vue_map.num_slots + 1) / 2 -<br>
- urb_entry_read_offset);<br>
- if (urb_entry_read_length == 0) {<br>
- /* Setting the URB entry read length to 0 causes undefined behavior, so<br>
- * if we have no URB data to read, set it to 1.<br>
- */<br>
- urb_entry_read_length = 1;<br>
- }<br>
-<br>
- dw1 =<br>
- GEN6_SF_SWIZZLE_ENABLE |<br>
- num_outputs << GEN6_SF_NUM_OUTPUTS_SHIFT |<br>
- urb_entry_read_length << GEN6_SF_URB_ENTRY_READ_LENGTH_SHIFT |<br>
- urb_entry_read_offset << GEN6_SF_URB_ENTRY_READ_OFFSET_SHIFT;<br>
+ dw1 = GEN6_SF_SWIZZLE_ENABLE | num_outputs << GEN6_SF_NUM_OUTPUTS_SHIFT;<br>
<br>
dw2 = GEN6_SF_STATISTICS_ENABLE |<br>
GEN6_SF_VIEWPORT_TRANSFORM_ENABLE;<br>
@@ -280,6 +268,7 @@ upload_sf_state(struct brw_context *brw)<br>
/* Create the mapping from the FS inputs we produce to the VS outputs<br>
* they source from.<br>
*/<br>
+ uint32_t max_source_attr = 0;<br>
for (; attr < FRAG_ATTRIB_MAX; attr++) {<br>
enum glsl_interp_qualifier interp_qualifier =<br>
brw->fragment_program->InterpQualifier[attr];<br>
@@ -312,12 +301,41 @@ upload_sf_state(struct brw_context *brw)<br>
assert(input_index < 16 || attr == input_index);<br>
<br>
/* CACHE_NEW_VS_PROG | _NEW_LIGHT | _NEW_PROGRAM */<br>
- attr_overrides[input_index++] =<br>
+ attr_overrides[input_index] =<br>
get_attr_override(&brw->vs.prog_data->vue_map,<br>
urb_entry_read_offset, attr,<br>
ctx->VertexProgram._TwoSideEnabled);<br>
+ bool swizzling = (attr_overrides[input_index] & 0xc0) != 0;<br>
+ uint32_t source_attr =<br>
+ (attr_overrides[input_index] & 0x1f) + (swizzling ? 1 : 0);<br>
+<br>
+ if (source_attr > max_source_attr)<br>
+ max_source_attr = source_attr;<br>
+<br>
+ ++input_index;<br></blockquote><div><br></div><div>Possible clean-up idea: pass the address of max_source_attr to get_attr_override(), and let get_attr_override() update it. That avoids having to take the bitfield back apart here, since get_attr_override() already knows what the source_attr is and understands the effect of swizzling. Also it will make it easier to share this bug fix between gen6 and gen7.<br>
</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
}<br>
<br>
+ /* From the Sandy Bridge PRM, Volume 2, Part 1, documentation for<br>
+ * 3DSTATE_SF DWord 1 bits 15:11, "Vertex URB Entry Read Length":<br>
+ *<br>
+ * "This field should be set to the minimum length required to read the<br>
+ * maximum source attribute. The maximum source attribute is indicated<br>
+ * by the maximum value of the enabled Attribute # Source Attribute if<br>
+ * Attribute Swizzle Enable is set, Number of Output Attributes-1 if<br>
+ * enable is not set.<br>
+ * read_length = ceiling((max_source_attr + 1) / 2)<br>
+ *<br>
+ * [errata] Corruption/Hang possible if length programmed larger than<br>
+ * recommended"<br>
+ *<br>
+ * We use Attribute Swizzle Enable, so max_source_attr is bits 4:0 of the<br>
+ * last element of attr_overrides set above. Using ALIGN(x, 2) guarantees<br>
+ * that the division will produce ceiling() rather than truncation.<br>
+ */<br>
+ uint32_t urb_entry_read_length = ALIGN(max_source_attr + 1, 2) / 2;<br>
+ dw1 |= urb_entry_read_length << GEN6_SF_URB_ENTRY_READ_LENGTH_SHIFT |<br>
+ urb_entry_read_offset << GEN6_SF_URB_ENTRY_READ_OFFSET_SHIFT;<br>
+<br>
for (; input_index < FRAG_ATTRIB_MAX; input_index++)<br>
attr_overrides[input_index] = 0;<br>
<span><font color="#888888"><br>
--<br>
1.8.1.2<br>
<br>
_______________________________________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a><br>
<a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev" target="_blank">http://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br>
</font></span></blockquote></div><br></div></div>