[Mesa-dev] [PATCH 3/4] i965/drm: Searching for a cached buffer for reuse

Fri May 11 15:35:04 UTC 2018

On Thu, 10 May 2018 13:56:12 -0700
Kenneth Graunke <kenneth at whitecape.org> wrote:

> On Friday, May 4, 2018 5:56:04 PM PDT James Xiong wrote:
> > From: "Xiong, James" <james.xiong at intel.com>
> > 
> > Now that a bucket contains cached buffers with different sizes, go
> > through its list and search for a cached buffer with enough size.
> > 
> > Signed-off-by: Xiong, James <james.xiong at intel.com>
> > ---
> >  src/mesa/drivers/dri/i965/brw_bufmgr.c | 21 +++++++++++++++------
> >  src/util/list.h                        |  5 +++++
> >  2 files changed, 20 insertions(+), 6 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > b/src/mesa/drivers/dri/i965/brw_bufmgr.c index 6a9b005..5235aa6
> > 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
> > @@ -281,7 +281,7 @@ cached_bo_for_size(struct brw_bufmgr *bufmgr,
> >     assert(!(busy && zeroed));
> >  
> >     if(bucket != NULL && !list_empty(&bucket->head)) {
> > -      struct brw_bo *bo;
> > +      struct brw_bo *bo, *temp_bo;
> >  retry:
> >        bo = NULL;
> >  
> > @@ -292,8 +292,13 @@ retry:
> >            * asked us to zero the buffer, we don't want this
> >            * because we are going to mmap it.
> >            */
> > -         bo = LIST_ENTRY(struct brw_bo, bucket->head.prev, head);
> > -         list_del(&bo->head);
> > +         LIST_FOR_EACH_ENTRY_REV(temp_bo, &bucket->head, head) {
> > +            if (temp_bo->size >= size) {
> > +               bo = temp_bo;
> > +               list_del(&bo->head);
> > +               break;
> > +            }
> > +         }
> >        } else {
> >           /* For non-render-target BOs (where we're probably
> >            * going to map it first thing in order to fill it
> > @@ -302,9 +307,13 @@ retry:
> >            * allocating a new buffer is probably faster than
> >            * waiting for the GPU to finish.
> >            */
> > -         bo = LIST_ENTRY(struct brw_bo, bucket->head.next, head);
> > -         if (!brw_bo_busy(bo)) {
> > -            list_del(&bo->head);
> > +         LIST_FOR_EACH_ENTRY(temp_bo, &bucket->head, head) {
> > +            if (temp_bo->size >= size &&
> > +                !brw_bo_busy(temp_bo)) {
> > +               bo = temp_bo;
> > +               list_del(&bo->head);
> > +               break;
> > +            }
> >           }
> >        }
> >  
> > diff --git a/src/util/list.h b/src/util/list.h
> > index 6edb750..9362072 100644
> > --- a/src/util/list.h
> > +++ b/src/util/list.h
> > @@ -189,6 +189,11 @@ static inline void list_validate(struct
> > list_head *list) &pos->member !=
> > (head);						\ pos =
> > container_of(pos->member.next, pos, member)) 
> > +#define LIST_FOR_EACH_ENTRY_REV(pos, head,
> > member)                      \
> > +   for (pos = NULL, pos = container_of((head)->prev, pos,
> > member);      \
> > +        &pos->member !=
> > (head);                                         \
> > +        pos = container_of(pos->member.prev, pos, member))
> > +
> >  #define LIST_FOR_EACH_ENTRY_SAFE(pos, storage, head,
> > member)	\ for (pos = NULL, pos = container_of((head)->next,
> > pos, member),	\ storage = container_of(pos->member.next,
> > pos, member);	\ 
> 
> Hi James,
> 
> I don't particularly like the idea of introducing linear linked list
> walks to find a buffer.  It's a really nice property to have every
> buffer in the bucket be suitable and be able to grab one immediately.
> It sounds like purging things more often in patch 4 reduces the cost
> of this new list walk, but...it'd be nice to not add it in the first
> place.
Yes, the purge, as well as the cleanup routine which kicks in from time
to time and removes any cached buffers freed 1+ second ago, keeps the
list short. One thing I could think of is to have a fixed size cached
buffer array so that we could use a binary search instead of walks of
the double-linked list.
> 
> This also conflicts with my new VMA allocator approach a lot, which
> assumes that all buffers in a bucket are the same size...
Yes.
> 
> Still, you've shown some pretty nice memory savings.  Scott and I were
> wondering if it would be possible to achieve a similar effect by
> tuning the bucket sizes - either adding more, or adjusting some.  The
> sizes we have are pretty arbitrary - we started with powers of two,
> and then added more intermediate sizes back in 2010 to reduce memory
> waste.
I think it will work, the memory saving will be somewhere between
depends on how many buckets we have, the more buckets the less size
gap between buckets and more memory saving, the thing is we will have to
walk the bucket list now.
> 
> For example, 1920x1080 RGBA8888 Y-tiled buffers are going to be really
> common, and should be 8,355,840 bytes...but the nearest bucket size is
> 8,388,608, so we waste 32KB.  We might want to make sure there's a
> bucket at that exact size.
Yes, but the tiling, pitch and height alignments might change for
example a new platform with different alignment requirements, it would
be a chore to keep track of all these changes and maintain the bucket
list.
another idea is to introduce a self-adopted/adjusted bucket list/array
on top of the existing buckets: we maintain a statistics of allocating
sizes and frequencies then cache some(for example 10) of most frequently
allocated buffers dynamically, thoughts?
> 
> I've wondered whether we should just use a geometric series for the
> bucket sizes, rather than the complex bucket_for_size() calculation
> we have now.  We might also want to drop some of the huge ones...
agreed. 
> 
> --Ken