[Mesa-dev] [PATCH 17/18] anv/allocator: Allow state pools to allocate large states

Thu May 4 18:10:04 UTC 2017

On Thu, May 4, 2017 at 9:26 AM, Juan A. Suarez Romero <jasuarez at igalia.com>
wrote:

> On Thu, 2017-05-04 at 07:49 -0700, Jason Ekstrand wrote:
> > On Thu, May 4, 2017 at 1:00 AM, Juan A. Suarez Romero <
> jasuarez at igalia.com> wrote:
> > > On Wed, 2017-04-26 at 07:35 -0700, Jason Ekstrand wrote:
> > > > Previously, the maximum size of a state that could be allocated from
> a
> > > > state pool was a block.  However, this has caused us various issues
> > > > particularly with shaders which are potentially very large.  We've
> also
> > > > hit issues with render passes with a large number of attachments
> when we
> > > > go to allocate the block of surface state.  This effectively removes
> the
> > > > restriction on the maximum size of a single state.  (There's still a
> > > > limit of 1MB imposed by a fixed-length bucket array.)
> > > >
> > > > For states larger than the block size, we just grab a large block
> off of
> > > > the block pool rather than sub-allocating.  When we go to allocate
> some
> > > > chunk of state and the current bucket does not have state, we try to
> > > > pull a chunk from some larger bucket and split it up.  This should
> > > > improve memory usage if a client occasionally allocates a large
> block of
> > > > state.
> > > >
> > > > This commit is inspired by some similar work done by Juan A. Suarez
> > > > Romero <jasuarez at igalia.com>.
> > > > ---
> > > >  src/intel/vulkan/anv_allocator.c | 43
> ++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 43 insertions(+)
> > > >
> > > > diff --git a/src/intel/vulkan/anv_allocator.c
> b/src/intel/vulkan/anv_allocator.c
> > > > index 7a687b7..68c389c 100644
> > > > --- a/src/intel/vulkan/anv_allocator.c
> > > > +++ b/src/intel/vulkan/anv_allocator.c
> > > > @@ -650,6 +650,12 @@ anv_fixed_size_state_pool_alloc_new(struct
> anv_fixed_size_state_pool *pool,
> > > >     struct anv_block_state block, old, new;
> > > >     uint32_t offset;
> > > >
> > > > +   /* If our state is large, we don't need any sub-allocation from
> a block.
> > > > +    * Instead, we just grab whole (potentially large) blocks.
> > > > +    */
> > > > +   if (state_size >= block_size)
> > > > +      return anv_block_pool_alloc(block_pool, state_size);
> > > > +
> > > >   restart:
> > > >     block.u64 = __sync_fetch_and_add(&pool->block.u64, state_size);
> > > >
> > > > @@ -702,6 +708,43 @@ anv_state_pool_alloc_no_vg(struct
> anv_state_pool *pool,
> > > >        goto done;
> > > >     }
> > > >
> > > > +   /* Try to grab a chunk from some larger bucket and split it up */
> > > > +   for (unsigned b = bucket + 1; b < ANV_STATE_BUCKETS; b++) {
> > > > +      int32_t chunk_offset;
> > > > +      if (anv_free_list_pop(&pool->buckets[b].free_list,
> > > > +                            &pool->block_pool.map, &chunk_offset)) {
> > > > +         unsigned chunk_size = anv_state_pool_get_bucket_size(b);
> > > > +
> > > > +         if (chunk_size > pool->block_size &&
> > > > +             state.alloc_size < pool->block_size) {
> > > > +            assert(chunk_size % pool->block_size == 0);
> > > > +            /* We don't want to split giant chunks into tiny
> chunks.  Instead,
> > > > +             * break anything bigger than a block into block-sized
> chunks and
> > > > +             * then break it down into bucket-sized chunks from
> there.  Return
> > > > +             * all but the first block of the chunk to the block
> bucket.
> > > > +             */
> > > > +            const uint32_t block_bucket =
> > > > +               anv_state_pool_get_bucket(pool->block_size);
> > > > +            anv_free_list_push(&pool->buckets[block_bucket].free_
> list,
> > > > +                               pool->block_pool.map,
> > > > +                               chunk_offset + pool->block_size,
> > > > +                               pool->block_size,
> > > > +                               (chunk_size / pool->block_size) - 1);
> > > > +            chunk_size = pool->block_size;
> > > > +         }
> > > > +
> > > > +         assert(chunk_size % state.alloc_size == 0);
> > > > +         anv_free_list_push(&pool->buckets[bucket].free_list,
> > > > +                            pool->block_pool.map,
> > > > +                            chunk_offset + state.alloc_size,
> > > > +                            state.alloc_size,
> > > > +                            (chunk_size / state.alloc_size) - 1);
> > > > +
> > > > +         state.offset = chunk_offset;
> > > > +         goto done;
> > > > +      }
> > >
> > >
> > > Wouldn't it better to split the bucket just in two parts, one to be
> > > used for our request, and the other with the available free space, but
> > > in on single bucket, instead of splitting it in various parts?
> > >
> > > This way, if we later get a new request that can potentially required
> > > the full (or part of) free single bucket we can assign it entirely (or
> > > splitting it again).
> >
> > Would you mind rephrasing that?  I'm not really sure what you mean.
> There are two other options in my mind:
>
> After re-thinking, I think what I wanted to do is a non-sense.
>
> In the example below, I was thinking on using 16B out of 2MB for the
> request, and put the remaining space (2MB-16B) in *one* bucket. Which,
> of course, is a non sense.
>
> >
> >  1. Fully split the chunk.  This would mean that allocating a 16B state
> could potentially split a 2MB chunk into 512K smaller chunks.  I'm trying
> to avoid that quantity of fragmentation.
> >
> >  2. Scatter the chunk through the buckets.  If you got a 2MB chunk, it
> would generate one of each: 2MB, 1MB, 512KB, 256KB, ..., 128B, 128B, 64B,
> and 32B and two 16B states.  The issue here is that you end up doing
> log2(chunk_size / state_size) atomics and it's very likely that you'll
> allocate more 16B states soon and every other one will have to go at least
> one bucket up.  Under the assumption that we tend to allocate
> identically-sized states, I think this is likely to perform worse than
> splitting a block_size thing into into block_size / state_size many chunks.
> >
> > What I did is an attempt to split the difference by splitting it at two
> levels rather than one or all.
> >
>
>
> I see now. Thanks for the explanation!
>

No problem.  I'll make that a comment.

> Reviewed-by: Juan A. Suarez Romero <jasuarez at igalia.com>
>

Thanks!

>
>
> > --Jason
> >
> > >         J.A.
> > >
> > >
> > >
> > > > +   }
> > > > +
> > > >     state.offset = anv_fixed_size_state_pool_
> alloc_new(&pool->buckets[bucket],
> > > >
> &pool->block_pool,
> > > >
> state.alloc_size,
> > > _______________________________________________
> > > mesa-dev mailing list
> > > mesa-dev at lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170504/e8bfc654/attachment-0001.html>