<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Oct 14, 2014 at 4:42 PM, Kenneth Graunke <span dir="ltr"><<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">According to INTEL_DEBUG=perf, "Borderlands: The Pre-Sequel" was<br>
stalling on nearly every glBufferSubData call, with very slightly<br>
overlapping busy ranges.<br>
<br>
It turns out the draw upload code was accidentally including an extra<br>
stride's worth of data in the vertex buffer size due to a simple<br>
off-by-one error. We considered this extra bit of buffer space to be<br>
busy (in use by the GPU), when it was actually idle.<br></blockquote><div><br></div><div>Nice Catch!<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
The new diagram should make it easier to understand the formula. It's<br>
basically what I drew on paper when working through an actual<br>
glDrawRangeElements call.<br>
<br>
Eliminates all glBufferSubData stalls in "Borderlands: The Pre-Sequel."<br>
<br>
Signed-off-by: Kenneth Graunke <<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>><br>
Cc: <a href="mailto:mesa-stable@lists.freedesktop.org" target="_blank">mesa-stable@lists.freedesktop.org</a><br>
---<br>
src/mesa/drivers/dri/i965/brw_draw_upload.c | 22 +++++++++++++++++++++-<br>
1 file changed, 21 insertions(+), 1 deletion(-)<br>
<br>
No Piglit regressions on Haswell. This might help Dota 2 and Serious Sam 3<br>
as well, but I haven't checked.<br>
<br>
diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c b/src/mesa/drivers/dri/i965/brw_draw_upload.c<br>
index 5a12439..6cb653c 100644<br>
--- a/src/mesa/drivers/dri/i965/brw_draw_upload.c<br>
+++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c<br>
@@ -486,8 +486,28 @@ brw_prepare_vertices(struct brw_context *brw)<br>
offset = 0;<br>
size = intel_buffer->Base.Size;<br>
} else {<br>
+ /* Compute the size/amount of data referenced by the GPU.<br>
+ * If the data is interleaved, StrideB may be larger than<br>
+ * _ElementSize. As an example, assume we have 2 interleaved<br>
+ * attributes A and B. The data is organized like this:<br>
+ *<br>
+ * Stride EltSize<br>
+ * _,,_ ,<br>
+ * / \ / \<br>
+ * A: --- --- --- --- --- ---<br>
+ * B: --- --- --- --- --- ---<br>
+ *<br>
+ * |===== 4 elts ======| (4-1) * Stride + EltSize<br>
+ *<br>
+ * max_index - min_index gives the number of elements that<br>
+ * will be referenced. Say we're drawing 4 elements. On<br>
+ * the first three, we need the full stride in order to get<br>
+ * to the next element. But on the last, we only want the<br>
+ * element size, since we don't actually read the other<br>
+ * interleaved vertex attributes stored beyond that.<br>
+ */<br>
offset = buffer->offset + min_index * buffer->stride;<br>
- size = (buffer->stride * (max_index - min_index) +<br>
+ size = (buffer->stride * MAX2(max_index - min_index - 1, 0) +<br>
glarray->_ElementSize);<br></blockquote><div><br></div><div>I'm not so sure about this new formula. Looking at the docs on glDrawRangeElements, the index range is inclusive on both ends meaning that the number of elements drawn is max_index - min_index + 1. If that's the case, then the old formula was correct. Are we using a half-open interval in mesa?<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
}<br>
}<br>
<span><font color="#888888">--<br>
2.1.2<br>
<br>
_______________________________________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a><br>
<a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev" target="_blank">http://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br>
</font></span></blockquote></div><br></div></div>