[Mesa-dev] [PATCH v3 04/16] i965/draw: Delay when we get the bo for vertex buffers

Mon May 23 08:57:57 UTC 2016

On Monday, May 23, 2016 1:35:31 AM PDT Jason Ekstrand wrote:
> On Mon, May 23, 2016 at 1:19 AM, Kenneth Graunke <kenneth at whitecape.org>
> wrote:
> 
> > On Friday, May 20, 2016 4:53:15 PM PDT Jason Ekstrand wrote:
> > > The previous code got the BO the first time we encountered it.  However,
> > > this can potentially lead to problems if the BO is used for multiple
> > arrays
> > > with the same buffer object because the range we declare as busy may not
> > be
> > > quite right.  By delaying the call to intel_bufferobj_buffer, we can
> > ensure
> > > that we have the full range for the given buffer.
> > >
> > > Cc: "11.1 11.2" <mesa-stable at lists.freedesktop.org>
> > > Reviewed-by: Iago Toral Quiroga <itoral at igalia.com>
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_draw_upload.c | 71 +++++++++++++++++++
> > +---------
> > >  1 file changed, 49 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c
> > b/src/mesa/drivers/
> > dri/i965/brw_draw_upload.c
> > > index 3ec37f8..0a7725d 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
> > > @@ -453,6 +453,11 @@ brw_prepare_vertices(struct brw_context *brw)
> > >     if (brw->vb.nr_buffers)
> > >        return;
> > >
> > > +   /* The range of data in a given buffer represented as [min, max) */
> > > +   struct intel_buffer_object *enabled_buffer[VERT_ATTRIB_MAX];
> > > +   uint32_t buffer_range_start[VERT_ATTRIB_MAX];
> > > +   uint32_t buffer_range_end[VERT_ATTRIB_MAX];
> > > +
> > >     for (i = j = 0; i < brw->vb.nr_enabled; i++) {
> > >        struct brw_vertex_element *input = brw->vb.enabled[i];
> > >        const struct gl_client_array *glarray = input->glarray;
> > > @@ -460,12 +465,31 @@ brw_prepare_vertices(struct brw_context *brw)
> > >        if (_mesa_is_bufferobj(glarray->BufferObj)) {
> > >        struct intel_buffer_object *intel_buffer =
> > >           intel_buffer_object(glarray->BufferObj);
> > > -      unsigned k;
> > > +
> > > +         const uint32_t offset = (uintptr_t)glarray->Ptr;
> > > +
> > > +         uint32_t start, range;
> > > +         if (glarray->InstanceDivisor) {
> > > +            start = offset;
> > > +            range = (glarray->StrideB * ((brw->num_instances /
> > > +                                         glarray->InstanceDivisor) - 1)
> > +
> > > +                     glarray->_ElementSize);
> > > +         } else {
> > > +            if (!brw->vb.index_bounds_valid) {
> > > +               start = 0;
> > > +               range = intel_buffer->Base.Size;
> > > +            } else {
> > > +               start = offset + min_index * glarray->StrideB;
> > > +               range = (glarray->StrideB * (max_index - min_index) +
> > > +                        glarray->_ElementSize);
> > > +            }
> > > +         }
> > >
> > >        /* If we have a VB set to be uploaded for this buffer object
> > >         * already, reuse that VB state so that we emit fewer
> > >         * relocations.
> > >         */
> > > +      unsigned k;
> > >        for (k = 0; k < i; k++) {
> > >           const struct gl_client_array *other =
> > brw->vb.enabled[k]->glarray;
> > >           if (glarray->BufferObj == other->BufferObj &&
> > > @@ -475,6 +499,9 @@ brw_prepare_vertices(struct brw_context *brw)
> > >           {
> > >              input->buffer = brw->vb.enabled[k]->buffer;
> > >              input->offset = glarray->Ptr - other->Ptr;
> > > +
> > > +               buffer_range_start[k] = MIN2(buffer_range_start[k],
> > start);
> > > +               buffer_range_end[k] = MAX2(buffer_range_end[k], start +
> > range);
> >
> > Hey Jason,
> >
> > This patch should work, but I wonder if this is the solution we want.
> >
> > You've found a real bug - we go look for an existing VERTEX_BUFFER_STATE
> > entry that could be reused.  If we found one, we wouldn't call
> > intel_bufferobj_buffer (and thus mark_buffer_gpu_usage), so we failed to
> > update the busy-tracking information.
> >
> > However, rather than widening the existing range to MIN(starts),
> > MAX(ends), I wonder if we shouldn't just mark the new subrange busy.
> >
> > I don't know how common this is, but consider the case where we had:
> >
> >     Existing range: BO #1, [0, 100]
> >     New range: BO #1, [1000, 1100].
> >
> > Your code would widen the busy range to [0, 1100].  Yes, technically,
> > that's how we set up the VERTEX_BUFFER_STATE.  But the GPU should only
> > ever read from the first or second range - never [101, 999].
> >
> > The point of reusing an existing VERTEX_BUFFER_STATE is supposedly to
> > reduce relocations...but it seems unfortunate to expand busy ranges in
> > order to do that, as stalls are way more expensive than relocations.
> >
> > It seems like we could solve this bug by simply adding an
> > intel_bufferobj_buffer call:
> >
> >    for (k = 0; k < i; k++) {
> >       const struct gl_client_array *other = brw->vb.enabled[k]->glarray;
> >       if (glarray->BufferObj == other->BufferObj &&
> >           glarray->StrideB == other->StrideB &&
> >           glarray->InstanceDivisor == other->InstanceDivisor &&
> >           (uintptr_t)(glarray->Ptr - other->Ptr) < glarray->StrideB)
> >       {
> >          input->buffer = brw->vb.enabled[k]->buffer;
> >          input->offset = glarray->Ptr - other->Ptr;
> > --NEW!-> intel_bufferobj_buffer(brw, intel_buffer, offset, size);
> >          break;
> >       }
> >    }
> >
> > I suppose we'd still need to move the offset/size calculations earlier,
> > like you've done in this patch.
> >
> > Thoughts?
> >
> 
> Perhaps?  I did think about that but it seemed like a lot of work.  Also, I
> looked at intel_bufferobj_buffer and found out that it just uses an
> interval so as far as ranges go it works out exactly the same.

Whoops.  Yeah, I'd forgotten that mark_buffer_gpu_usage just does
MIN/MAX anyway.  So yeah...my comment is rather pointless.  If that case
is actually important (and I have no evidence that it is), then we'd
have even more things to fix, first.

> The real reason for the move wasn't the call to intel_bufferobj_buffer but
> so that we could get the correct size to set in VERTEX_BUFFER_STATE.  We
> could emit more VB states to solve that problem and, TBH, that might not be
> a bad idea.  I'm not really sure.
> --Jason

Right.  At any rate, this seems reasonable.

I'm trying to convince myself that the new arrays are initialized in
all cases.  I may have to apply the patch and re-read it tomorrow.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160523/1a6b4ed5/attachment.sig>