[Mesa-dev] [PATCH] i965: Fix Vertex URB Read Length calculation in 3DSTATE_SF on Gen6.
Paul Berry
stereotype441 at gmail.com
Sat Feb 2 10:28:52 PST 2013
On 2 February 2013 03:44, Kenneth Graunke <kenneth at whitecape.org> wrote:
> The old calculation was off by one in certain cases. The documentation
> contains a precise formula for how to calculate it, so just use that.
>
Wow, thank you for putting in the time and effort to track this down, Ken.
I spent a fair amount of time this morning instrumenting the code to
understand exactly what has changed, so that I could reassure myself that
this is a true bug fix and not just a behavioural change that
coincidentally makes the heisenbug disappear. I believe your fix is
correct, and I believe I understand why the bug happened (I even have a
speculation about how the hardware works that would explain why this fix is
necessary).
Can you see if my understanding of the bug makes sense to you? It's a
little different from your "off by one in certain cases" explanation, and I
want to make sure that's not due to a misunderstanding on my part.
My understanding of the bug as of this morning:
Previous to this patch, we thought that the only restrictions on
3DSTATE_SF's URB read length were (a) it needs to be large enough to read
all the VUE data that the SF needs, and (b) it can't be so large that it
tries to read VUE data that doesn't exist. Since the VUE map already tells
us how much VUE data exists, we didn't bother worrying about restriction
(a); we just did the easy thing and programmed the read length to satisfy
restriction (b).
However, we didn't notice this erratum in the hardware docs: "[errata]
Corruption/Hang possible if length programmed larger than recommended".
Judging by the context surrounding this erratum, it's pretty clear that it
means "URB read length must be exactly the size necessary to read all the
VUE data that the SF needs, and no larger". Which means that we can't
program the read length based on restriction (b)--we have to program it
based on restriction (a). Which is what your patch does.
So, while it's technically correct that the old calculation could be off by
one in certain cases, it doesn't really capture the essence of the problem,
because the old calculation was simply the wrong calculation. In some
cases it was correct, and in other cases it was off by one or sometimes
more. (In fact, in the test case Jordan isolated for bug 56920, I believe
it was off by 2). The real explanation here is that the URB read size
needs to precisely match the amount of data that the SF consumes; it
doesn't work to simply base it on the size of the VUE.
Speculation about why it's necessary for the URB read size to precisely
match the amount of data that the SF consumes: maybe the only thing
synchronizing the SF's URB reads with the rest of the work it does is a
data dependency. In other words, maybe the only thing stopping the SF from
skipping ahead to triangle B before triangle A's URB read completes is that
it has to wait for the data from that URB read before it can finish
processing triangle A. If we make the URB read size too big, we break that
synchronization mechanism by causing the SF to receive extra data that it's
not smart enough to wait around for. As a result, if we run a program
where the FS ignores the VS's last varying slot(s), and follow it
immediately by a program where the FS uses those slot(s), then there's a
danger that one of the VUE reads from the old program will still be in
flight when the new program starts, and it'll get misinterpreted as varying
data for the first triangle of the new program. This would explain pretty
much everything we've seen about this bug: why it only happens when the VS
outputs extra varyings that the FS doesn't use, why bug 56920 is so
sensitive to the ordering of the varyings, why bug 60172 bisects to a patch
that changes varying order, why the bug manifests as corrupted data in a
single vertex of the first triangle rendered by the second program, and
finally, why doing a glFinish() between the two programs masks the
bug--because it gives enough time for the extra URB read to complete before
the second program starts.
Does this sound right to you? Does it jive with what you saw in bug
60172? If so it would be nice to expand on the commit message a bit. I
don't think we need to include my speculation above, but at a minimum it
would be nice to quote the erratum, and explain that as a consequence of
it, we have to base the URB read length on the maximum VUE slot that is
used by the SF rather than the maximum VUE slot that exists.
One other minor comment below.
>
> Fixes random corruption in Steam's Big Picture Mode, random corruption
> in PlaneShift (since the varying reordering code landed), and Piglit
> test ________.
>
> NOTE: This is a candidate for all stable branches.
>
> (The code needs to get cleaned up - the final result is quite ugly - but
> I wanted to put it on the list for the people working on these bugs.)
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56920
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60172
> Signed-off-by<https://bugs.freedesktop.org/show_bug.cgi?id=60172Signed-off-by>:
> Kenneth Graunke <kenneth at whitecape.org>
> ---
> src/mesa/drivers/dri/i965/gen6_sf_state.c | 52
> +++++++++++++++++++++----------
> 1 file changed, 35 insertions(+), 17 deletions(-)
>
> I don't expect this to go upstream as is; it should get cleaned up first.
>
> We also should do the same thing for Ivybridge. Haven't tried that yet.
>
> diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c
> b/src/mesa/drivers/dri/i965/gen6_sf_state.c
> index c1bc252..83ff79c 100644
> --- a/src/mesa/drivers/dri/i965/gen6_sf_state.c
> +++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c
> @@ -113,7 +113,6 @@ upload_sf_state(struct brw_context *brw)
> {
> struct intel_context *intel = &brw->intel;
> struct gl_context *ctx = &intel->ctx;
> - uint32_t urb_entry_read_length;
> /* BRW_NEW_FRAGMENT_PROGRAM */
> uint32_t num_outputs =
> _mesa_bitcount_64(brw->fragment_program->Base.InputsRead);
> /* _NEW_LIGHT */
> @@ -125,26 +124,15 @@ upload_sf_state(struct brw_context *brw)
> bool multisampled_fbo = ctx->DrawBuffer->Visual.samples > 1;
>
> int attr = 0, input_index = 0;
> + /* Skip over the first two entries in the VUE map (position and point
> size),
> + * as they're passed through anyway and reading them is just overhead.
> + */
> int urb_entry_read_offset = 1;
> float point_size;
> uint16_t attr_overrides[FRAG_ATTRIB_MAX];
> uint32_t point_sprite_origin;
>
> - /* CACHE_NEW_VS_PROG */
> - urb_entry_read_length = ((brw->vs.prog_data->vue_map.num_slots + 1) /
> 2 -
> - urb_entry_read_offset);
> - if (urb_entry_read_length == 0) {
> - /* Setting the URB entry read length to 0 causes undefined
> behavior, so
> - * if we have no URB data to read, set it to 1.
> - */
> - urb_entry_read_length = 1;
> - }
> -
> - dw1 =
> - GEN6_SF_SWIZZLE_ENABLE |
> - num_outputs << GEN6_SF_NUM_OUTPUTS_SHIFT |
> - urb_entry_read_length << GEN6_SF_URB_ENTRY_READ_LENGTH_SHIFT |
> - urb_entry_read_offset << GEN6_SF_URB_ENTRY_READ_OFFSET_SHIFT;
> + dw1 = GEN6_SF_SWIZZLE_ENABLE | num_outputs <<
> GEN6_SF_NUM_OUTPUTS_SHIFT;
>
> dw2 = GEN6_SF_STATISTICS_ENABLE |
> GEN6_SF_VIEWPORT_TRANSFORM_ENABLE;
> @@ -280,6 +268,7 @@ upload_sf_state(struct brw_context *brw)
> /* Create the mapping from the FS inputs we produce to the VS outputs
> * they source from.
> */
> + uint32_t max_source_attr = 0;
> for (; attr < FRAG_ATTRIB_MAX; attr++) {
> enum glsl_interp_qualifier interp_qualifier =
> brw->fragment_program->InterpQualifier[attr];
> @@ -312,12 +301,41 @@ upload_sf_state(struct brw_context *brw)
> assert(input_index < 16 || attr == input_index);
>
> /* CACHE_NEW_VS_PROG | _NEW_LIGHT | _NEW_PROGRAM */
> - attr_overrides[input_index++] =
> + attr_overrides[input_index] =
> get_attr_override(&brw->vs.prog_data->vue_map,
> urb_entry_read_offset, attr,
> ctx->VertexProgram._TwoSideEnabled);
> + bool swizzling = (attr_overrides[input_index] & 0xc0) != 0;
> + uint32_t source_attr =
> + (attr_overrides[input_index] & 0x1f) + (swizzling ? 1 : 0);
> +
> + if (source_attr > max_source_attr)
> + max_source_attr = source_attr;
> +
> + ++input_index;
>
Possible clean-up idea: pass the address of max_source_attr to
get_attr_override(), and let get_attr_override() update it. That avoids
having to take the bitfield back apart here, since get_attr_override()
already knows what the source_attr is and understands the effect of
swizzling. Also it will make it easier to share this bug fix between gen6
and gen7.
> }
>
> + /* From the Sandy Bridge PRM, Volume 2, Part 1, documentation for
> + * 3DSTATE_SF DWord 1 bits 15:11, "Vertex URB Entry Read Length":
> + *
> + * "This field should be set to the minimum length required to read the
> + * maximum source attribute. The maximum source attribute is
> indicated
> + * by the maximum value of the enabled Attribute # Source Attribute if
> + * Attribute Swizzle Enable is set, Number of Output Attributes-1 if
> + * enable is not set.
> + * read_length = ceiling((max_source_attr + 1) / 2)
> + *
> + * [errata] Corruption/Hang possible if length programmed larger than
> + * recommended"
> + *
> + * We use Attribute Swizzle Enable, so max_source_attr is bits 4:0 of
> the
> + * last element of attr_overrides set above. Using ALIGN(x, 2)
> guarantees
> + * that the division will produce ceiling() rather than truncation.
> + */
> + uint32_t urb_entry_read_length = ALIGN(max_source_attr + 1, 2) / 2;
> + dw1 |= urb_entry_read_length << GEN6_SF_URB_ENTRY_READ_LENGTH_SHIFT |
> + urb_entry_read_offset << GEN6_SF_URB_ENTRY_READ_OFFSET_SHIFT;
> +
> for (; input_index < FRAG_ATTRIB_MAX; input_index++)
> attr_overrides[input_index] = 0;
>
> --
> 1.8.1.2
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130202/e32aad34/attachment.html>
More information about the mesa-dev
mailing list