[Mesa-dev] [PATCH v3 04/16] i965/draw: Delay when we get the bo for vertex buffers
Kenneth Graunke
kenneth at whitecape.org
Mon May 23 08:19:38 UTC 2016
On Friday, May 20, 2016 4:53:15 PM PDT Jason Ekstrand wrote:
> The previous code got the BO the first time we encountered it. However,
> this can potentially lead to problems if the BO is used for multiple arrays
> with the same buffer object because the range we declare as busy may not be
> quite right. By delaying the call to intel_bufferobj_buffer, we can ensure
> that we have the full range for the given buffer.
>
> Cc: "11.1 11.2" <mesa-stable at lists.freedesktop.org>
> Reviewed-by: Iago Toral Quiroga <itoral at igalia.com>
> ---
> src/mesa/drivers/dri/i965/brw_draw_upload.c | 71 +++++++++++++++++++
+---------
> 1 file changed, 49 insertions(+), 22 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c b/src/mesa/drivers/
dri/i965/brw_draw_upload.c
> index 3ec37f8..0a7725d 100644
> --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
> +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
> @@ -453,6 +453,11 @@ brw_prepare_vertices(struct brw_context *brw)
> if (brw->vb.nr_buffers)
> return;
>
> + /* The range of data in a given buffer represented as [min, max) */
> + struct intel_buffer_object *enabled_buffer[VERT_ATTRIB_MAX];
> + uint32_t buffer_range_start[VERT_ATTRIB_MAX];
> + uint32_t buffer_range_end[VERT_ATTRIB_MAX];
> +
> for (i = j = 0; i < brw->vb.nr_enabled; i++) {
> struct brw_vertex_element *input = brw->vb.enabled[i];
> const struct gl_client_array *glarray = input->glarray;
> @@ -460,12 +465,31 @@ brw_prepare_vertices(struct brw_context *brw)
> if (_mesa_is_bufferobj(glarray->BufferObj)) {
> struct intel_buffer_object *intel_buffer =
> intel_buffer_object(glarray->BufferObj);
> - unsigned k;
> +
> + const uint32_t offset = (uintptr_t)glarray->Ptr;
> +
> + uint32_t start, range;
> + if (glarray->InstanceDivisor) {
> + start = offset;
> + range = (glarray->StrideB * ((brw->num_instances /
> + glarray->InstanceDivisor) - 1) +
> + glarray->_ElementSize);
> + } else {
> + if (!brw->vb.index_bounds_valid) {
> + start = 0;
> + range = intel_buffer->Base.Size;
> + } else {
> + start = offset + min_index * glarray->StrideB;
> + range = (glarray->StrideB * (max_index - min_index) +
> + glarray->_ElementSize);
> + }
> + }
>
> /* If we have a VB set to be uploaded for this buffer object
> * already, reuse that VB state so that we emit fewer
> * relocations.
> */
> + unsigned k;
> for (k = 0; k < i; k++) {
> const struct gl_client_array *other = brw->vb.enabled[k]->glarray;
> if (glarray->BufferObj == other->BufferObj &&
> @@ -475,6 +499,9 @@ brw_prepare_vertices(struct brw_context *brw)
> {
> input->buffer = brw->vb.enabled[k]->buffer;
> input->offset = glarray->Ptr - other->Ptr;
> +
> + buffer_range_start[k] = MIN2(buffer_range_start[k], start);
> + buffer_range_end[k] = MAX2(buffer_range_end[k], start +
range);
Hey Jason,
This patch should work, but I wonder if this is the solution we want.
You've found a real bug - we go look for an existing VERTEX_BUFFER_STATE
entry that could be reused. If we found one, we wouldn't call
intel_bufferobj_buffer (and thus mark_buffer_gpu_usage), so we failed to
update the busy-tracking information.
However, rather than widening the existing range to MIN(starts),
MAX(ends), I wonder if we shouldn't just mark the new subrange busy.
I don't know how common this is, but consider the case where we had:
Existing range: BO #1, [0, 100]
New range: BO #1, [1000, 1100].
Your code would widen the busy range to [0, 1100]. Yes, technically,
that's how we set up the VERTEX_BUFFER_STATE. But the GPU should only
ever read from the first or second range - never [101, 999].
The point of reusing an existing VERTEX_BUFFER_STATE is supposedly to
reduce relocations...but it seems unfortunate to expand busy ranges in
order to do that, as stalls are way more expensive than relocations.
It seems like we could solve this bug by simply adding an
intel_bufferobj_buffer call:
for (k = 0; k < i; k++) {
const struct gl_client_array *other = brw->vb.enabled[k]->glarray;
if (glarray->BufferObj == other->BufferObj &&
glarray->StrideB == other->StrideB &&
glarray->InstanceDivisor == other->InstanceDivisor &&
(uintptr_t)(glarray->Ptr - other->Ptr) < glarray->StrideB)
{
input->buffer = brw->vb.enabled[k]->buffer;
input->offset = glarray->Ptr - other->Ptr;
--NEW!-> intel_bufferobj_buffer(brw, intel_buffer, offset, size);
break;
}
}
I suppose we'd still need to move the offset/size calculations earlier,
like you've done in this patch.
Thoughts?
> break;
> }
> }
> @@ -482,29 +509,13 @@ brw_prepare_vertices(struct brw_context *brw)
> struct brw_vertex_buffer *buffer = &brw->vb.buffers[j];
>
> /* Named buffer object: Just reference its contents directly. */
> - buffer->offset = (uintptr_t)glarray->Ptr;
> + buffer->offset = offset;
> buffer->stride = glarray->StrideB;
> buffer->step_rate = glarray->InstanceDivisor;
>
> - uint32_t offset, size;
> - if (glarray->InstanceDivisor) {
> - offset = buffer->offset;
> - size = (buffer->stride * ((brw->num_instances /
> - glarray->InstanceDivisor) - 1) +
> - glarray->_ElementSize);
> - } else {
> - if (!brw->vb.index_bounds_valid) {
> - offset = 0;
> - size = intel_buffer->Base.Size;
> - } else {
> - offset = buffer->offset + min_index * buffer->stride;
> - size = (buffer->stride * (max_index - min_index) +
> - glarray->_ElementSize);
> - }
> - }
> - buffer->bo = intel_bufferobj_buffer(brw, intel_buffer,
> - offset, size);
> - drm_intel_bo_reference(buffer->bo);
> + enabled_buffer[j] = intel_buffer;
> + buffer_range_start[j] = start;
> + buffer_range_end[j] = start + range;
>
> input->buffer = j++;
> input->offset = 0;
> @@ -519,7 +530,7 @@ brw_prepare_vertices(struct brw_context *brw)
> * probably a service to the poor programmer to do so rather than
> * trying to just not render.
> */
> - assert(input->offset < brw->vb.buffers[input->buffer].bo->size);
> + assert(input->offset < intel_buffer->Base.Size);
> } else {
> /* Queue the buffer object up to be uploaded in the next pass,
> * when we've decided if we're doing interleaved or not.
> @@ -554,6 +565,22 @@ brw_prepare_vertices(struct brw_context *brw)
> }
> }
>
> + /* Now that we've set up all of the buffers, we walk through and
reference
> + * each of them. We do this late so that we get the right size in each
> + * buffer and don't reference too little data.
> + */
> + for (i = 0; i < j; i++) {
> + struct brw_vertex_buffer *buffer = &brw->vb.buffers[i];
> + if (buffer->bo)
> + continue;
> +
> + const uint32_t start = buffer_range_start[i];
> + const uint32_t range = buffer_range_end[i] - buffer_range_start[i];
> +
> + buffer->bo = intel_bufferobj_buffer(brw, enabled_buffer[i], start,
range);
> + drm_intel_bo_reference(buffer->bo);
> + }
> +
> /* If we need to upload all the arrays, then we can trim those arrays to
> * only the used elements [min_index, max_index] so long as we adjust
all
> * the values used in the 3DPRIMITIVE i.e. by setting the vertex bias.
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160523/0852d595/attachment.sig>
More information about the mesa-dev
mailing list