[Mesa-dev] [PATCH] i965: Fix Vertex URB Read Length calculation in 3DSTATE_SF on Gen6.
Kenneth Graunke
kenneth at whitecape.org
Sat Feb 2 13:49:50 PST 2013
On 02/02/2013 10:28 AM, Paul Berry wrote:
> On 2 February 2013 03:44, Kenneth Graunke <kenneth at whitecape.org
> <mailto:kenneth at whitecape.org>> wrote:
>
> The old calculation was off by one in certain cases. The documentation
> contains a precise formula for how to calculate it, so just use that.
>
>
> Wow, thank you for putting in the time and effort to track this down,
> Ken. I spent a fair amount of time this morning instrumenting the code
> to understand exactly what has changed, so that I could reassure myself
> that this is a true bug fix and not just a behavioural change that
> coincidentally makes the heisenbug disappear. I believe your fix is
> correct, and I believe I understand why the bug happened (I even have a
> speculation about how the hardware works that would explain why this fix
> is necessary).
>
> Can you see if my understanding of the bug makes sense to you? It's a
> little different from your "off by one in certain cases" explanation,
> and I want to make sure that's not due to a misunderstanding on my part.
Yeah, the "it's sometimes off by one" explanation was more a "I've been
messing around with this formula and trimming apitraces for hours and
it's 4am and the formulas give different answers because this one
apparently works". You're right, I've definitely seen it off by 0, 1, or 2.
> My understanding of the bug as of this morning:
>
> Previous to this patch, we thought that the only restrictions on
> 3DSTATE_SF's URB read length were (a) it needs to be large enough to
> read all the VUE data that the SF needs, and (b) it can't be so large
> that it tries to read VUE data that doesn't exist. Since the VUE map
> already tells us how much VUE data exists, we didn't bother worrying
> about restriction (a); we just did the easy thing and programmed the
> read length to satisfy restriction (b).
>
> However, we didn't notice this erratum in the hardware docs: "[errata]
> Corruption/Hang possible if length programmed larger than recommended".
> Judging by the context surrounding this erratum, it's pretty clear that
> it means "URB read length must be exactly the size necessary to read all
> the VUE data that the SF needs, and no larger". Which means that we
> can't program the read length based on restriction (b)--we have to
> program it based on restriction (a). Which is what your patch does.
>
> So, while it's technically correct that the old calculation could be off
> by one in certain cases, it doesn't really capture the essence of the
> problem, because the old calculation was simply the wrong calculation.
> In some cases it was correct, and in other cases it was off by one or
> sometimes more. (In fact, in the test case Jordan isolated for bug
> 56920, I believe it was off by 2). The real explanation here is that
> the URB read size needs to precisely match the amount of data that the
> SF consumes; it doesn't work to simply base it on the size of the VUE.
This makes a ton of sense to me.
> Speculation about why it's necessary for the URB read size to precisely
> match the amount of data that the SF consumes: maybe the only thing
> synchronizing the SF's URB reads with the rest of the work it does is a
> data dependency. In other words, maybe the only thing stopping the SF
> from skipping ahead to triangle B before triangle A's URB read completes
> is that it has to wait for the data from that URB read before it can
> finish processing triangle A. If we make the URB read size too big, we
> break that synchronization mechanism by causing the SF to receive extra
> data that it's not smart enough to wait around for. As a result, if we
> run a program where the FS ignores the VS's last varying slot(s), and
> follow it immediately by a program where the FS uses those slot(s), then
> there's a danger that one of the VUE reads from the old program will
> still be in flight when the new program starts, and it'll get
> misinterpreted as varying data for the first triangle of the new
> program. This would explain pretty much everything we've seen about
> this bug: why it only happens when the VS outputs extra varyings that
> the FS doesn't use, why bug 56920 is so sensitive to the ordering of the
> varyings, why bug 60172 bisects to a patch that changes varying order,
> why the bug manifests as corrupted data in a single vertex of the first
> triangle rendered by the second program, and finally, why doing a
> glFinish() between the two programs masks the bug--because it gives
> enough time for the extra URB read to complete before the second program
> starts.
This sounds believable enough. And maybe they fixed some of that
synchronization in Ivybridge, which is why it works there. Or perhaps
we've just been lucky. I doubt we'll ever really know.
> Does this sound right to you? Does it jive with what you saw in bug
> 60172? If so it would be nice to expand on the commit message a bit. I
> don't think we need to include my speculation above, but at a minimum it
> would be nice to quote the erratum, and explain that as a consequence of
> it, we have to base the URB read length on the maximum VUE slot that is
> used by the SF rather than the maximum VUE slot that exists.
I cut and pasted your explanation into the commit message, since it's a
very clear explanation of what's going on. Thanks Paul!
> One other minor comment below.
>
>
> Fixes random corruption in Steam's Big Picture Mode, random corruption
> in PlaneShift (since the varying reordering code landed), and Piglit
> test ________.
>
> NOTE: This is a candidate for all stable branches.
>
> (The code needs to get cleaned up - the final result is quite ugly - but
> I wanted to put it on the list for the people working on these bugs.)
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56920
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60172
> Signed-off-by
> <https://bugs.freedesktop.org/show_bug.cgi?id=60172Signed-off-by>:
> Kenneth Graunke <kenneth at whitecape.org <mailto:kenneth at whitecape.org>>
> ---
> src/mesa/drivers/dri/i965/gen6_sf_state.c | 52
> +++++++++++++++++++++----------
> 1 file changed, 35 insertions(+), 17 deletions(-)
>
> I don't expect this to go upstream as is; it should get cleaned up
> first.
>
> We also should do the same thing for Ivybridge. Haven't tried that yet.
>
> diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c
> b/src/mesa/drivers/dri/i965/gen6_sf_state.c
> index c1bc252..83ff79c 100644
> --- a/src/mesa/drivers/dri/i965/gen6_sf_state.c
> +++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c
> @@ -113,7 +113,6 @@ upload_sf_state(struct brw_context *brw)
> {
> struct intel_context *intel = &brw->intel;
> struct gl_context *ctx = &intel->ctx;
> - uint32_t urb_entry_read_length;
> /* BRW_NEW_FRAGMENT_PROGRAM */
> uint32_t num_outputs =
> _mesa_bitcount_64(brw->fragment_program->Base.InputsRead);
> /* _NEW_LIGHT */
> @@ -125,26 +124,15 @@ upload_sf_state(struct brw_context *brw)
> bool multisampled_fbo = ctx->DrawBuffer->Visual.samples > 1;
>
> int attr = 0, input_index = 0;
> + /* Skip over the first two entries in the VUE map (position and
> point size),
> + * as they're passed through anyway and reading them is just
> overhead.
> + */
> int urb_entry_read_offset = 1;
> float point_size;
> uint16_t attr_overrides[FRAG_ATTRIB_MAX];
> uint32_t point_sprite_origin;
>
> - /* CACHE_NEW_VS_PROG */
> - urb_entry_read_length = ((brw->vs.prog_data->vue_map.num_slots +
> 1) / 2 -
> - urb_entry_read_offset);
> - if (urb_entry_read_length == 0) {
> - /* Setting the URB entry read length to 0 causes undefined
> behavior, so
> - * if we have no URB data to read, set it to 1.
> - */
> - urb_entry_read_length = 1;
> - }
> -
> - dw1 =
> - GEN6_SF_SWIZZLE_ENABLE |
> - num_outputs << GEN6_SF_NUM_OUTPUTS_SHIFT |
> - urb_entry_read_length << GEN6_SF_URB_ENTRY_READ_LENGTH_SHIFT |
> - urb_entry_read_offset << GEN6_SF_URB_ENTRY_READ_OFFSET_SHIFT;
> + dw1 = GEN6_SF_SWIZZLE_ENABLE | num_outputs <<
> GEN6_SF_NUM_OUTPUTS_SHIFT;
>
> dw2 = GEN6_SF_STATISTICS_ENABLE |
> GEN6_SF_VIEWPORT_TRANSFORM_ENABLE;
> @@ -280,6 +268,7 @@ upload_sf_state(struct brw_context *brw)
> /* Create the mapping from the FS inputs we produce to the VS
> outputs
> * they source from.
> */
> + uint32_t max_source_attr = 0;
> for (; attr < FRAG_ATTRIB_MAX; attr++) {
> enum glsl_interp_qualifier interp_qualifier =
> brw->fragment_program->InterpQualifier[attr];
> @@ -312,12 +301,41 @@ upload_sf_state(struct brw_context *brw)
> assert(input_index < 16 || attr == input_index);
>
> /* CACHE_NEW_VS_PROG | _NEW_LIGHT | _NEW_PROGRAM */
> - attr_overrides[input_index++] =
> + attr_overrides[input_index] =
> get_attr_override(&brw->vs.prog_data->vue_map,
> urb_entry_read_offset, attr,
> ctx->VertexProgram._TwoSideEnabled);
> + bool swizzling = (attr_overrides[input_index] & 0xc0) != 0;
> + uint32_t source_attr =
> + (attr_overrides[input_index] & 0x1f) + (swizzling ? 1 : 0);
> +
> + if (source_attr > max_source_attr)
> + max_source_attr = source_attr;
> +
> + ++input_index;
>
>
> Possible clean-up idea: pass the address of max_source_attr to
> get_attr_override(), and let get_attr_override() update it. That avoids
> having to take the bitfield back apart here, since get_attr_override()
> already knows what the source_attr is and understands the effect of
> swizzling. Also it will make it easier to share this bug fix between
> gen6 and gen7.
Yeah, I wanted to do something like that (but again, 4am). I took your
suggestion and it's actually something I wouldn't be embarassed to
upstream now. Thanks!
> }
>
> + /* From the Sandy Bridge PRM, Volume 2, Part 1, documentation for
> + * 3DSTATE_SF DWord 1 bits 15:11, "Vertex URB Entry Read Length":
> + *
> + * "This field should be set to the minimum length required to
> read the
> + * maximum source attribute. The maximum source attribute is
> indicated
> + * by the maximum value of the enabled Attribute # Source
> Attribute if
> + * Attribute Swizzle Enable is set, Number of Output
> Attributes-1 if
> + * enable is not set.
> + * read_length = ceiling((max_source_attr + 1) / 2)
> + *
> + * [errata] Corruption/Hang possible if length programmed
> larger than
> + * recommended"
> + *
> + * We use Attribute Swizzle Enable, so max_source_attr is bits
> 4:0 of the
> + * last element of attr_overrides set above. Using ALIGN(x, 2)
> guarantees
> + * that the division will produce ceiling() rather than truncation.
> + */
> + uint32_t urb_entry_read_length = ALIGN(max_source_attr + 1, 2) / 2;
> + dw1 |= urb_entry_read_length <<
> GEN6_SF_URB_ENTRY_READ_LENGTH_SHIFT |
> + urb_entry_read_offset << GEN6_SF_URB_ENTRY_READ_OFFSET_SHIFT;
> +
> for (; input_index < FRAG_ATTRIB_MAX; input_index++)
> attr_overrides[input_index] = 0;
>
> --
> 1.8.1.2
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org <mailto:mesa-dev at lists.freedesktop.org>
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
More information about the mesa-dev
mailing list