[Intel-gfx] [PATCH] RFC drm/i915: Remove (struct_mutex) locking for wait-ioctl
Chris Wilson
chris at chris-wilson.co.uk
Wed Dec 23 04:26:46 PST 2015
On Wed, Dec 23, 2015 at 12:13:15PM +0000, Chris Wilson wrote:
> On Wed, Dec 23, 2015 at 12:05:00PM +0000, Chris Wilson wrote:
> > On Wed, Dec 23, 2015 at 11:32:23AM +0000, Chris Wilson wrote:
> > > On Wed, Dec 23, 2015 at 11:19:23AM +0000, Chris Wilson wrote:
> > > > With a bit of care (and leniency) we can iterate over the object and
> > > > wait for previous rendering to complete with judicial use of atomic
> > > > reference counting. The ABI requires us to ensure that an active object
> > > > is eventually flushed (like the busy-ioctl) which is guaranteed by our
> > > > management of requests (i.e. everything that is submitted to hardware is
> > > > flushed in the same request). All we have to do is ensure that we can
> > > > detect when the requests are complete for reporting when the object is
> > > > idle (without triggering ETIME) - this is handled by
> > > > __i915_wait_request.
> > > >
> > > > The biggest danger in the code is walking the object without holding any
> > > > locks. We iterate over the set of last requests and carefully grab a
> > > > reference upon it. (If it is changing beneath us, that is the usual
> > > > userspace race and even with locking you get the same indeterminate
> > > > results.) If the request is unreferenced beneath us, it will be disposed
> > > > of into the request cache - so we have to carefully order the retrieval
> > > > of the request pointer with its removal.
> > > >
> > > > The impact of this is actually quite small - the return to userspace
> > > > following the wait was already lockless. What we achieve here is
> > > > completing an already finished wait without hitting the struct_mutex,
> > > > our hold is quite short and so we are typically just a victim of
> > > > contention rather than a cause.
> > > >
> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > >
> > > Some food for thought. I would especially like someone to poke holes in
> > > the racy pointer lookup and check the store mb() versus the
> > > rcu-reference.
> >
> > So what I think is the missing element here is then
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 871aaae1a9d5..4d4ab8e6423f 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4193,7 +4193,8 @@ i915_gem_load(struct drm_device *dev)
> > dev_priv->requests =
> > kmem_cache_create("i915_gem_request",
> > sizeof(struct drm_i915_gem_request), 0,
> > - SLAB_HWCACHE_ALIGN,
> > + SLAB_HWCACHE_ALIGN |
> > + SLAB_DESTROY_BY_RCU,
> > NULL);
>
> And we promptly run into memory exhaustion issues.
And with some carefully placed synchronize_rcu() we may be able to get
the best of both worlds...
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list