On 2 September 2011 12:59, Eric Anholt <span dir="ltr"><<a href="mailto:eric@anholt.net">eric@anholt.net</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">On Fri, 2 Sep 2011 09:06:44 -0700, Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>> wrote:<br>
> Previously, the old VS backend computed the URB entry size by adding<br>
> the number of vertex shader outputs to the size of the URB header.<br>
> This often produced a larger result than necessary, because some<br>
> vertex shader outputs are stored in the header, so they were being<br>
> double counted. This patch changes the old VS backend to compute the<br>
> URB entry size directly from the number of slots in the VUE map.<br>
<br>
</div>This kind of scares me, if the later stages with similar failure at<br>
computing the size might be trying to read too much data from a<br>
now-smaller VUE. I'm not sure what happens in that case. (hopefully<br>
the answer is "read garbage", but what if our vertex happens to be at<br>
the end of URB space?<br></blockquote><div><br>Yeah, good point. I don't know what the GPU would do if the vertex were at the end of URB space and a later stage tried to read beyond the end. Hopefully, as you say, the answer is "read garbage". Fortunately, later patches in the series adjust the later pipeline stages to compute their URB read sizes based on the VUE map. So by the time the whole patch series is applied, we should be fine. (In fact, we should be in far lower danger of this kind of problem, because after the whole patch series, all URB sizes are computed in the same way from the VUE map).<br>
<br>So, depending on how the hardware behaves when reading beyond the end of a URB entry, there is a minor danger that this patch will introduce a temporary regression that gets fixed by later patches in the series. I'm on the fence as to what to do about it. On one hand, I could move this patch to the end of the series and that would close the window for temporary regressions. On the other hand, that would mean that the VS-related changes aren't grouped together anymore. Advice?<br>
</div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im"><br>
> ---<br>
> src/mesa/drivers/dri/i965/brw_vs.h | 1 -<br>
> src/mesa/drivers/dri/i965/brw_vs_emit.c | 31 +++++++------------------------<br>
> 2 files changed, 7 insertions(+), 25 deletions(-)<br>
><br>
> diff --git a/src/mesa/drivers/dri/i965/brw_vs.h b/src/mesa/drivers/dri/i965/brw_vs.h<br>
> index a02c06d..c31869c 100644<br>
> --- a/src/mesa/drivers/dri/i965/brw_vs.h<br>
> +++ b/src/mesa/drivers/dri/i965/brw_vs.h<br>
> @@ -65,7 +65,6 @@ struct brw_vs_compile {<br>
><br>
> struct brw_vue_map vue_map;<br>
> GLuint first_output;<br>
> - GLuint nr_outputs;<br>
> GLuint last_scratch;<br>
><br>
> GLuint first_tmp;<br>
> diff --git a/src/mesa/drivers/dri/i965/brw_vs_emit.c b/src/mesa/drivers/dri/i965/brw_vs_emit.c<br>
> index e7667c8..7c430ce 100644<br>
> --- a/src/mesa/drivers/dri/i965/brw_vs_emit.c<br>
> +++ b/src/mesa/drivers/dri/i965/brw_vs_emit.c<br>
> @@ -402,36 +402,19 @@ static void brw_vs_alloc_regs( struct brw_vs_compile *c )<br>
> /* The VS VUEs are shared by VF (outputting our inputs) and VS, so size<br>
> * them to fit the biggest thing they need to.<br>
> */<br>
> - attributes_in_vue = MAX2(c->nr_outputs, c->nr_inputs);<br>
> + attributes_in_vue = MAX2(c->vue_map.num_slots, c->nr_inputs);<br>
<br>
</div>There's a subtle change here, that we're no longer counting header regs<br>
towards the size of the VF input. Is that appropriate? I think it is,<br>
because brw_vs_state.c has vs->thread3.urb_entry_read_offset = 0.<br>
<br>
This probably deserves mention in the commit message.<br></blockquote><div><br>Good point. I will make mention of this in the commit message.<br><br>BTW I don't think the *new* VS backend accounts for the number of inputs at all when computing the URB entry size. That's probably a bug (I believe it would manifest itself for cases where there are a lot of vertex attributes but not many varyings). I kind of glazed over this detail when making the patch series because I don't understand how VS inputs work yet. Do you have an opinion on this?<br>
</div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im"><br>
> - /* See emit_vertex_write() for where the VUE's overhead on top of the<br>
> - * attributes comes from.<br>
> - */<br>
> - if (intel->gen >= 7) {<br>
> - int header_regs = 2;<br>
> - if (c->key.nr_userclip)<br>
> - header_regs += 2;<br>
> -<br>
> - /* Each attribute is 16 bytes (1 vec4), so dividing by 4 gives us the<br>
> - * number of 64-byte (512-bit) units.<br>
> - */<br>
> - c->prog_data.urb_entry_size = (attributes_in_vue + header_regs + 3) / 4;<br>
> - } else if (intel->gen == 6) {<br>
> - int header_regs = 2;<br>
> - if (c->key.nr_userclip)<br>
> - header_regs += 2;<br>
> -<br>
> - /* Each attribute is 16 bytes (1 vec4), so dividing by 8 gives us the<br>
> + if (intel->gen == 6) {<br>
> + /* Each attribute is 32 bytes (2 vec4s), so dividing by 8 gives us the<br>
> * number of 128-byte (1024-bit) units.<br>
> */<br>
<br>
</div>This comment change here looks wrong to me. 32 bytes * 8 = 256 bytes.<br>
Same below. An attribute is 1 vec4, and the urb_entry_size is for one<br>
vertex's URB entry, even though a register contains 2 vec4s which get<br>
output to two separate URB entries.<br></blockquote><div><br>Oops, you're right. This was a total brain failure; the comment was correct as is. Good catch.<br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div><div></div><div class="h5"><br>
> - c->prog_data.urb_entry_size = (attributes_in_vue + header_regs + 7) / 8;<br>
> - } else if (intel->gen == 5)<br>
> + c->prog_data.urb_entry_size = (attributes_in_vue + 7) / 8;<br>
> + } else {<br>
> /* Each attribute is 16 bytes (1 vec4), so dividing by 4 gives us the<br>
> * number of 64-byte (512-bit) units.<br>
> */<br>
> - c->prog_data.urb_entry_size = (attributes_in_vue + 6 + 3) / 4;<br>
> - else<br>
> - c->prog_data.urb_entry_size = (attributes_in_vue + 2 + 3) / 4;<br>
> + c->prog_data.urb_entry_size = (attributes_in_vue + 3) / 4;<br>
> + }<br>
</div></div></blockquote></div><br>