<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, May 23, 2016 at 1:19 AM, Kenneth Graunke <span dir="ltr"><<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Friday, May 20, 2016 4:53:15 PM PDT Jason Ekstrand wrote:<br>
> The previous code got the BO the first time we encountered it. However,<br>
> this can potentially lead to problems if the BO is used for multiple arrays<br>
> with the same buffer object because the range we declare as busy may not be<br>
> quite right. By delaying the call to intel_bufferobj_buffer, we can ensure<br>
> that we have the full range for the given buffer.<br>
><br>
> Cc: "11.1 11.2" <<a href="mailto:mesa-stable@lists.freedesktop.org">mesa-stable@lists.freedesktop.org</a>><br>
> Reviewed-by: Iago Toral Quiroga <<a href="mailto:itoral@igalia.com">itoral@igalia.com</a>><br>
> ---<br>
> src/mesa/drivers/dri/i965/brw_draw_upload.c | 71 +++++++++++++++++++<br>
+---------<br>
> 1 file changed, 49 insertions(+), 22 deletions(-)<br>
><br>
> diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c b/src/mesa/drivers/<br>
dri/i965/brw_draw_upload.c<br>
> index 3ec37f8..0a7725d 100644<br>
> --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c<br>
> +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c<br>
> @@ -453,6 +453,11 @@ brw_prepare_vertices(struct brw_context *brw)<br>
> if (brw->vb.nr_buffers)<br>
> return;<br>
><br>
> + /* The range of data in a given buffer represented as [min, max) */<br>
> + struct intel_buffer_object *enabled_buffer[VERT_ATTRIB_MAX];<br>
> + uint32_t buffer_range_start[VERT_ATTRIB_MAX];<br>
> + uint32_t buffer_range_end[VERT_ATTRIB_MAX];<br>
> +<br>
> for (i = j = 0; i < brw->vb.nr_enabled; i++) {<br>
> struct brw_vertex_element *input = brw->vb.enabled[i];<br>
> const struct gl_client_array *glarray = input->glarray;<br>
> @@ -460,12 +465,31 @@ brw_prepare_vertices(struct brw_context *brw)<br>
> if (_mesa_is_bufferobj(glarray->BufferObj)) {<br>
> struct intel_buffer_object *intel_buffer =<br>
> intel_buffer_object(glarray->BufferObj);<br>
> - unsigned k;<br>
> +<br>
> + const uint32_t offset = (uintptr_t)glarray->Ptr;<br>
> +<br>
> + uint32_t start, range;<br>
> + if (glarray->InstanceDivisor) {<br>
> + start = offset;<br>
> + range = (glarray->StrideB * ((brw->num_instances /<br>
> + glarray->InstanceDivisor) - 1) +<br>
> + glarray->_ElementSize);<br>
> + } else {<br>
> + if (!brw->vb.index_bounds_valid) {<br>
> + start = 0;<br>
> + range = intel_buffer->Base.Size;<br>
> + } else {<br>
> + start = offset + min_index * glarray->StrideB;<br>
> + range = (glarray->StrideB * (max_index - min_index) +<br>
> + glarray->_ElementSize);<br>
> + }<br>
> + }<br>
><br>
> /* If we have a VB set to be uploaded for this buffer object<br>
> * already, reuse that VB state so that we emit fewer<br>
> * relocations.<br>
> */<br>
> + unsigned k;<br>
> for (k = 0; k < i; k++) {<br>
> const struct gl_client_array *other = brw->vb.enabled[k]->glarray;<br>
> if (glarray->BufferObj == other->BufferObj &&<br>
> @@ -475,6 +499,9 @@ brw_prepare_vertices(struct brw_context *brw)<br>
> {<br>
> input->buffer = brw->vb.enabled[k]->buffer;<br>
> input->offset = glarray->Ptr - other->Ptr;<br>
> +<br>
> + buffer_range_start[k] = MIN2(buffer_range_start[k], start);<br>
> + buffer_range_end[k] = MAX2(buffer_range_end[k], start +<br>
range);<br>
<br>
</div></div>Hey Jason,<br>
<br>
This patch should work, but I wonder if this is the solution we want.<br>
<br>
You've found a real bug - we go look for an existing VERTEX_BUFFER_STATE<br>
entry that could be reused. If we found one, we wouldn't call<br>
intel_bufferobj_buffer (and thus mark_buffer_gpu_usage), so we failed to<br>
update the busy-tracking information.<br>
<br>
However, rather than widening the existing range to MIN(starts),<br>
MAX(ends), I wonder if we shouldn't just mark the new subrange busy.<br>
<br>
I don't know how common this is, but consider the case where we had:<br>
<br>
Existing range: BO #1, [0, 100]<br>
New range: BO #1, [1000, 1100].<br>
<br>
Your code would widen the busy range to [0, 1100]. Yes, technically,<br>
that's how we set up the VERTEX_BUFFER_STATE. But the GPU should only<br>
ever read from the first or second range - never [101, 999].<br>
<br>
The point of reusing an existing VERTEX_BUFFER_STATE is supposedly to<br>
reduce relocations...but it seems unfortunate to expand busy ranges in<br>
order to do that, as stalls are way more expensive than relocations.<br>
<br>
It seems like we could solve this bug by simply adding an<br>
intel_bufferobj_buffer call:<br>
<span class=""><br>
for (k = 0; k < i; k++) {<br>
const struct gl_client_array *other = brw->vb.enabled[k]->glarray;<br>
if (glarray->BufferObj == other->BufferObj &&<br>
</span> glarray->StrideB == other->StrideB &&<br>
glarray->InstanceDivisor == other->InstanceDivisor &&<br>
(uintptr_t)(glarray->Ptr - other->Ptr) < glarray->StrideB)<br>
<span class=""> {<br>
input->buffer = brw->vb.enabled[k]->buffer;<br>
input->offset = glarray->Ptr - other->Ptr;<br>
</span>--NEW!-> intel_bufferobj_buffer(brw, intel_buffer, offset, size);<br>
break;<br>
}<br>
}<br>
<br>
I suppose we'd still need to move the offset/size calculations earlier,<br>
like you've done in this patch.<br>
<br>
Thoughts?<br><div class="HOEnZb"><div class="h5"></div></div></blockquote><div><br></div><div>Perhaps? I did think about that but it seemed like a lot of work. Also, I looked at intel_bufferobj_buffer and found out that it just uses an interval so as far as ranges go it works out exactly the same.<br><br></div><div>The real reason for the move wasn't the call to intel_bufferobj_buffer but so that we could get the correct size to set in VERTEX_BUFFER_STATE. We could emit more VB states to solve that problem and, TBH, that might not be a bad idea. I'm not really sure.<br></div><div>--Jason<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">
> break;<br>
> }<br>
> }<br>
> @@ -482,29 +509,13 @@ brw_prepare_vertices(struct brw_context *brw)<br>
> struct brw_vertex_buffer *buffer = &brw->vb.buffers[j];<br>
><br>
> /* Named buffer object: Just reference its contents directly. */<br>
> - buffer->offset = (uintptr_t)glarray->Ptr;<br>
> + buffer->offset = offset;<br>
> buffer->stride = glarray->StrideB;<br>
> buffer->step_rate = glarray->InstanceDivisor;<br>
><br>
> - uint32_t offset, size;<br>
> - if (glarray->InstanceDivisor) {<br>
> - offset = buffer->offset;<br>
> - size = (buffer->stride * ((brw->num_instances /<br>
> - glarray->InstanceDivisor) - 1) +<br>
> - glarray->_ElementSize);<br>
> - } else {<br>
> - if (!brw->vb.index_bounds_valid) {<br>
> - offset = 0;<br>
> - size = intel_buffer->Base.Size;<br>
> - } else {<br>
> - offset = buffer->offset + min_index * buffer->stride;<br>
> - size = (buffer->stride * (max_index - min_index) +<br>
> - glarray->_ElementSize);<br>
> - }<br>
> - }<br>
> - buffer->bo = intel_bufferobj_buffer(brw, intel_buffer,<br>
> - offset, size);<br>
> - drm_intel_bo_reference(buffer->bo);<br>
> + enabled_buffer[j] = intel_buffer;<br>
> + buffer_range_start[j] = start;<br>
> + buffer_range_end[j] = start + range;<br>
><br>
> input->buffer = j++;<br>
> input->offset = 0;<br>
> @@ -519,7 +530,7 @@ brw_prepare_vertices(struct brw_context *brw)<br>
> * probably a service to the poor programmer to do so rather than<br>
> * trying to just not render.<br>
> */<br>
> - assert(input->offset < brw->vb.buffers[input->buffer].bo->size);<br>
> + assert(input->offset < intel_buffer->Base.Size);<br>
> } else {<br>
> /* Queue the buffer object up to be uploaded in the next pass,<br>
> * when we've decided if we're doing interleaved or not.<br>
> @@ -554,6 +565,22 @@ brw_prepare_vertices(struct brw_context *brw)<br>
> }<br>
> }<br>
><br>
> + /* Now that we've set up all of the buffers, we walk through and<br>
reference<br>
> + * each of them. We do this late so that we get the right size in each<br>
> + * buffer and don't reference too little data.<br>
> + */<br>
> + for (i = 0; i < j; i++) {<br>
> + struct brw_vertex_buffer *buffer = &brw->vb.buffers[i];<br>
> + if (buffer->bo)<br>
> + continue;<br>
> +<br>
> + const uint32_t start = buffer_range_start[i];<br>
> + const uint32_t range = buffer_range_end[i] - buffer_range_start[i];<br>
> +<br>
> + buffer->bo = intel_bufferobj_buffer(brw, enabled_buffer[i], start,<br>
range);<br>
> + drm_intel_bo_reference(buffer->bo);<br>
> + }<br>
> +<br>
> /* If we need to upload all the arrays, then we can trim those arrays to<br>
> * only the used elements [min_index, max_index] so long as we adjust<br>
all<br>
> * the values used in the 3DPRIMITIVE i.e. by setting the vertex bias.<br>
><br>
<br>
</div></div></blockquote></div><br></div></div>