[Intel-gfx] [PATCH 1/7] drm/i915: Introduce i915_address_space.mutex

Thu Jul 12 07:01:55 UTC 2018

On Wed, Jul 11, 2018 at 10:49:51AM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2018-07-11 10:36:36)
> > On Wed, Jul 11, 2018 at 11:33:26AM +0200, Daniel Vetter wrote:
> > > On Wed, Jul 11, 2018 at 08:36:02AM +0100, Chris Wilson wrote:
> > > > Add a mutex into struct i915_address_space to be used while operating on
> > > > the vma and their lists for a particular vm. As this may be called from
> > > > the shrinker, we taint the mutex with fs_reclaim so that from the start
> > > > lockdep warns us if we are caught holding the mutex across an
> > > > allocation. (With such small steps we will eventually rid ourselves of
> > > > struct_mutex recursion!)
> > > > 
> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > 
> > > Not sure it exists in a branch of yours already, but here's my thoughts on
> > > extending this to the address_space lrus and the shrinker callback (which
> > > I think would be the next step with good pay-off):
> > > 
> > > 1. make sure pin_count is protected by reservation_obj.
> 
> It's vma->pin_count, so I guarded it with vm->mutex (along with the
> drm_mm and vm->*list, with some vigorous pruning of lists to reduce the
> locking surface). There's also the obj->vma_list and trees that are
> moved to a obj->vma_lock.

Hm, Christian König's series to wrap dma_buf_map/unmap in the
reservation_obj is why I thought we'd need that. But just for the unmap of
foreign objects I guess we can always punt that to some worker, or just
don't bother if it's contended.

> > > 2. grab the vm.mutex when walking LRUs everywhere. This is going to be
> > > tricky for ggtt because of runtime PM. Between lock-dropping, carefully
> > > avoiding rpm when cleaning up objects and just grabbing an rpm wakeref
> > > when walking the ggtt vm this should be possible to work around (since for
> > > the fences we clearly need to be able to nest the vm.mutex within rpm or
> > > we're busted).
> > > 3. In the shrinker trylock the reservation_obj and treat a failure to get
> > > the lock as if pin_count is elevated. If we can't shrink enough then grab
> > > a temporary reference to the bo using kref_get_unless_zero, drop the
> > > vm.mutex (since that's what gave us the weak ref) and do a blocking
> > > reservation_obj lock.
> > 
> > Ok this doesn't work, because reservation_obj needs to allow allocations.
> > But compared to our current lock stealing trickery the above scheme
> > reduces possibilities to shrink, or at least rate-limit command submission
> > somewhat. Not sure how to best tackle that.
> 
> Right, the only way is to avoid us using reservation_obj->lock inside
> the shrinker, which is quite easy (for the moment at least, I've haven't
> completed the i915_request eradication of struct_mutex for ordering
> which is likely to require more reservation_obj or similar).

Yeah I thihnk as long as i915 doesn't need it we can get away with
trylocks for foreign objects (which I think will need it, sooner or
later).

> I do have a branch that gets us to no struct_mutex inside intel_display
> and to remove struct_mutex from the shrinker within 10 more ugly patches
> that pass CI, which is a start. Something that I want to drip feed
> slowly so that we can catch the high MTBF regressions that are inevitable.

Very much agreed on drip feeding :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch