[Mesa-dev] [PATCH v3 04/16] i965/draw: Delay when we get the bo for vertex buffers

Mon May 23 08:35:31 UTC 2016

On Mon, May 23, 2016 at 1:19 AM, Kenneth Graunke <kenneth at whitecape.org>
wrote:

> On Friday, May 20, 2016 4:53:15 PM PDT Jason Ekstrand wrote:
> > The previous code got the BO the first time we encountered it.  However,
> > this can potentially lead to problems if the BO is used for multiple
> arrays
> > with the same buffer object because the range we declare as busy may not
> be
> > quite right.  By delaying the call to intel_bufferobj_buffer, we can
> ensure
> > that we have the full range for the given buffer.
> >
> > Cc: "11.1 11.2" <mesa-stable at lists.freedesktop.org>
> > Reviewed-by: Iago Toral Quiroga <itoral at igalia.com>
> > ---
> >  src/mesa/drivers/dri/i965/brw_draw_upload.c | 71 +++++++++++++++++++
> +---------
> >  1 file changed, 49 insertions(+), 22 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c
> b/src/mesa/drivers/
> dri/i965/brw_draw_upload.c
> > index 3ec37f8..0a7725d 100644
> > --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
> > +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
> > @@ -453,6 +453,11 @@ brw_prepare_vertices(struct brw_context *brw)
> >     if (brw->vb.nr_buffers)
> >        return;
> >
> > +   /* The range of data in a given buffer represented as [min, max) */
> > +   struct intel_buffer_object *enabled_buffer[VERT_ATTRIB_MAX];
> > +   uint32_t buffer_range_start[VERT_ATTRIB_MAX];
> > +   uint32_t buffer_range_end[VERT_ATTRIB_MAX];
> > +
> >     for (i = j = 0; i < brw->vb.nr_enabled; i++) {
> >        struct brw_vertex_element *input = brw->vb.enabled[i];
> >        const struct gl_client_array *glarray = input->glarray;
> > @@ -460,12 +465,31 @@ brw_prepare_vertices(struct brw_context *brw)
> >        if (_mesa_is_bufferobj(glarray->BufferObj)) {
> >        struct intel_buffer_object *intel_buffer =
> >           intel_buffer_object(glarray->BufferObj);
> > -      unsigned k;
> > +
> > +         const uint32_t offset = (uintptr_t)glarray->Ptr;
> > +
> > +         uint32_t start, range;
> > +         if (glarray->InstanceDivisor) {
> > +            start = offset;
> > +            range = (glarray->StrideB * ((brw->num_instances /
> > +                                         glarray->InstanceDivisor) - 1)
> +
> > +                     glarray->_ElementSize);
> > +         } else {
> > +            if (!brw->vb.index_bounds_valid) {
> > +               start = 0;
> > +               range = intel_buffer->Base.Size;
> > +            } else {
> > +               start = offset + min_index * glarray->StrideB;
> > +               range = (glarray->StrideB * (max_index - min_index) +
> > +                        glarray->_ElementSize);
> > +            }
> > +         }
> >
> >        /* If we have a VB set to be uploaded for this buffer object
> >         * already, reuse that VB state so that we emit fewer
> >         * relocations.
> >         */
> > +      unsigned k;
> >        for (k = 0; k < i; k++) {
> >           const struct gl_client_array *other =
> brw->vb.enabled[k]->glarray;
> >           if (glarray->BufferObj == other->BufferObj &&
> > @@ -475,6 +499,9 @@ brw_prepare_vertices(struct brw_context *brw)
> >           {
> >              input->buffer = brw->vb.enabled[k]->buffer;
> >              input->offset = glarray->Ptr - other->Ptr;
> > +
> > +               buffer_range_start[k] = MIN2(buffer_range_start[k],
> start);
> > +               buffer_range_end[k] = MAX2(buffer_range_end[k], start +
> range);
>
> Hey Jason,
>
> This patch should work, but I wonder if this is the solution we want.
>
> You've found a real bug - we go look for an existing VERTEX_BUFFER_STATE
> entry that could be reused.  If we found one, we wouldn't call
> intel_bufferobj_buffer (and thus mark_buffer_gpu_usage), so we failed to
> update the busy-tracking information.
>
> However, rather than widening the existing range to MIN(starts),
> MAX(ends), I wonder if we shouldn't just mark the new subrange busy.
>
> I don't know how common this is, but consider the case where we had:
>
>     Existing range: BO #1, [0, 100]
>     New range: BO #1, [1000, 1100].
>
> Your code would widen the busy range to [0, 1100].  Yes, technically,
> that's how we set up the VERTEX_BUFFER_STATE.  But the GPU should only
> ever read from the first or second range - never [101, 999].
>
> The point of reusing an existing VERTEX_BUFFER_STATE is supposedly to
> reduce relocations...but it seems unfortunate to expand busy ranges in
> order to do that, as stalls are way more expensive than relocations.
>
> It seems like we could solve this bug by simply adding an
> intel_bufferobj_buffer call:
>
>    for (k = 0; k < i; k++) {
>       const struct gl_client_array *other = brw->vb.enabled[k]->glarray;
>       if (glarray->BufferObj == other->BufferObj &&
>           glarray->StrideB == other->StrideB &&
>           glarray->InstanceDivisor == other->InstanceDivisor &&
>           (uintptr_t)(glarray->Ptr - other->Ptr) < glarray->StrideB)
>       {
>          input->buffer = brw->vb.enabled[k]->buffer;
>          input->offset = glarray->Ptr - other->Ptr;
> --NEW!-> intel_bufferobj_buffer(brw, intel_buffer, offset, size);
>          break;
>       }
>    }
>
> I suppose we'd still need to move the offset/size calculations earlier,
> like you've done in this patch.
>
> Thoughts?
>

Perhaps?  I did think about that but it seemed like a lot of work.  Also, I
looked at intel_bufferobj_buffer and found out that it just uses an
interval so as far as ranges go it works out exactly the same.

The real reason for the move wasn't the call to intel_bufferobj_buffer but
so that we could get the correct size to set in VERTEX_BUFFER_STATE.  We
could emit more VB states to solve that problem and, TBH, that might not be
a bad idea.  I'm not really sure.
--Jason

> >              break;
> >           }
> >        }
> > @@ -482,29 +509,13 @@ brw_prepare_vertices(struct brw_context *brw)
> >           struct brw_vertex_buffer *buffer = &brw->vb.buffers[j];
> >
> >           /* Named buffer object: Just reference its contents directly.
> */
> > -         buffer->offset = (uintptr_t)glarray->Ptr;
> > +         buffer->offset = offset;
> >           buffer->stride = glarray->StrideB;
> >           buffer->step_rate = glarray->InstanceDivisor;
> >
> > -            uint32_t offset, size;
> > -            if (glarray->InstanceDivisor) {
> > -               offset = buffer->offset;
> > -               size = (buffer->stride * ((brw->num_instances /
> > -                                          glarray->InstanceDivisor) -
> 1) +
> > -                       glarray->_ElementSize);
> > -            } else {
> > -               if (!brw->vb.index_bounds_valid) {
> > -                  offset = 0;
> > -                  size = intel_buffer->Base.Size;
> > -               } else {
> > -                  offset = buffer->offset + min_index * buffer->stride;
> > -                  size = (buffer->stride * (max_index - min_index) +
> > -                          glarray->_ElementSize);
> > -               }
> > -            }
> > -            buffer->bo = intel_bufferobj_buffer(brw, intel_buffer,
> > -                                                offset, size);
> > -            drm_intel_bo_reference(buffer->bo);
> > +            enabled_buffer[j] = intel_buffer;
> > +            buffer_range_start[j] = start;
> > +            buffer_range_end[j] = start + range;
> >
> >           input->buffer = j++;
> >           input->offset = 0;
> > @@ -519,7 +530,7 @@ brw_prepare_vertices(struct brw_context *brw)
> >         * probably a service to the poor programmer to do so rather than
> >         * trying to just not render.
> >         */
> > -      assert(input->offset < brw->vb.buffers[input->buffer].bo->size);
> > +      assert(input->offset < intel_buffer->Base.Size);
> >        } else {
> >        /* Queue the buffer object up to be uploaded in the next pass,
> >         * when we've decided if we're doing interleaved or not.
> > @@ -554,6 +565,22 @@ brw_prepare_vertices(struct brw_context *brw)
> >        }
> >     }
> >
> > +   /* Now that we've set up all of the buffers, we walk through and
> reference
> > +    * each of them.  We do this late so that we get the right size in
> each
> > +    * buffer and don't reference too little data.
> > +    */
> > +   for (i = 0; i < j; i++) {
> > +      struct brw_vertex_buffer *buffer = &brw->vb.buffers[i];
> > +      if (buffer->bo)
> > +         continue;
> > +
> > +      const uint32_t start = buffer_range_start[i];
> > +      const uint32_t range = buffer_range_end[i] -
> buffer_range_start[i];
> > +
> > +      buffer->bo = intel_bufferobj_buffer(brw, enabled_buffer[i], start,
> range);
> > +      drm_intel_bo_reference(buffer->bo);
> > +   }
> > +
> >     /* If we need to upload all the arrays, then we can trim those
> arrays to
> >      * only the used elements [min_index, max_index] so long as we adjust
> all
> >      * the values used in the 3DPRIMITIVE i.e. by setting the vertex
> bias.
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160523/aa3552a7/attachment-0001.html>