[Intel-gfx] [RFC 1/4] drm/i915: Implement a framework for batch buffer pools

Thu Jun 19 21:07:30 CEST 2014

On Thu, Jun 19, 2014 at 10:35:44AM -0700, Volkin, Bradley D wrote:
> On Thu, Jun 19, 2014 at 02:48:29AM -0700, Tvrtko Ursulin wrote:
> > 
> > Hi Brad,
> > 
> > On 06/18/2014 05:36 PM, bradley.d.volkin at intel.com wrote:
> > > From: Brad Volkin <bradley.d.volkin at intel.com>
> > >
> > > This adds a small module for managing a pool of batch buffers.
> > > The only current use case is for the command parser, as described
> > > in the kerneldoc in the patch. The code is simple, but separating
> > > it out makes it easier to change the underlying algorithms and to
> > > extend to future use cases should they arise.
> > >
> > > The interface is simple: alloc to create an empty pool, free to
> > > clean it up; get to obtain a new buffer, put to return it to the
> > > pool. Note that all buffers must be returned to the pool before
> > > freeing it.
> > >
> > > The pool has a maximum number of buffers allowed due to some tests
> > > (e.g. gem_exec_nop) creating a very large number of buffers (e.g.
> > > ___). Buffers are purgeable while in the pool, but not explicitly
> > > truncated in order to avoid overhead during execbuf.
> > >
> > > Locking is currently based on the caller holding the struct_mutex.
> > > We already do that in the places where we will use the batch pool
> > > for the command parser.
> > >
> > > Signed-off-by: Brad Volkin <bradley.d.volkin at intel.com>
> > > ---
> > >
> > > r.e. pool capacity
> > > My original testing showed something like thousands of buffers in
> > > the pool after a gem_exec_nop run. But when I reran with the max
> > > check disabled just now to get an actual number for the commit
> > > message, the number was more like 130.  I developed and tested the
> > > changes incrementally, and suspect that the original run was before
> > > I implemented the actual copy operation. So I'm inclined to remove
> > > or at least increase the cap in the final version. Thoughts?
> > 
> > Some random thoughts:
> > 
> > Is it strictly necessary to cap the pool size? I ask because it seems to 
> > be introducing a limit where so far there wasn't an explicit one.
> 
> No, I only added it because there were a huge number of buffers in the
> pool at one point. But that seems to have been an artifact of my
> development process, so unless someone says they really want to keep
> the cap, I'm going to drop it in the next rev.

As long as we have a single global lru and mark buffers correctly as
purgeable the shrinker logic will size your working set optimally. So I
don't think a cap is necessary as long as you reuse eagerly.

> > Are object sizes generally page aligned or all you've seen all sizes in 
> > the distribution? Either way, I am thinking whether some sort of 
> > rounding up would be more efficient? Would it cause a problem if 
> > slightly larger object was returned?
> 
> I believe objects must have page-aligned sizes.

Yes.

> > Given that objects are only ever added to the pool, once max number is 
> > allocated and there are no free ones of exact size it nags userspace 
> > with EAGAIN and retires objects. But I wonder if the above points could 
> > reduce that behaviour?
> > 
> > Could we get away without tracking the given out objects in a list and 
> > just keep a list of available ones? In which case if object can only 
> > ever be either in the free pool or on one of the existing GEM 
> > active/inactive lists the same list head could be used?
> > 
> > Could it use its own locking just as easily? Just thinking if the future 
> > goal is to fine grain locking, this seems self contained enough to be 
> > doable straight away unless I am missing something.
> 
> I thought about putting a spinlock in the batch pool struct for
> exactly that reason. But we have to hold the struct_mutex for at least
> drm_gem_object_unreference(). I guess we don't actually need it for
> i915_gem_alloc_object(). I opted to just put the mutex_is_locked()
> checks so the dependency on the lock being held was hopefully easy to
> find by the poor soul who eventually tries to tackle the locking
> improvements :). But if people would rather I add a lock to the batch
> pool struct, I can do that.

If we ever get around to more fine-grained locking I expect three classes
of locks: per-gem-object locks, one lru lock for all these lists and a
low-level lock for actually submitting batches to the hw (i.e. tail/ring
updates and execlist submission). But before we have that adding
fine-grained locks that always must nest within dev->struct_mutex will
only make the eventual conversion harder.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch