[Intel-gfx] [PATCH 10/16] drm/i915: Remove (struct_mutex) locking for wait-ioctl

Thu Aug 4 10:02:41 UTC 2016

On Thu, Aug 04, 2016 at 11:26:04AM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-01 at 19:22 +0100, Chris Wilson wrote:
> > With a bit of care (and leniency) we can iterate over the object and
> > wait for previous rendering to complete with judicial use of atomic
> > reference counting. The ABI requires us to ensure that an active object
> > is eventually flushed (like the busy-ioctl) which is guaranteed by our
> > management of requests (i.e. everything that is submitted to hardware is
> > flushed in the same request). All we have to do is ensure that we can
> > detect when the requests are complete for reporting when the object is
> > idle (without triggering ETIME) - this is handled by
> > __i915_wait_request.
> > 
> > The biggest danger in the code is walking the object without holding any
> > locks. We iterate over the set of last requests and carefully grab a
> > reference upon it. (If it is changing beneath us, that is the usual
> > userspace race and even with locking you get the same indeterminate
> > results.) If the request is unreferenced beneath us, it will be disposed
> > of into the request cache - so we have to carefully order the retrieval
> > of the request pointer with its removal, and to do this we employ RCU on
> > the request cache and upon the last_request pointer tracking.
> > 
> > The impact of this is actually quite small - the return to userspace
> > following the wait was already lockless. What we achieve here is
> > completing an already finished wait without hitting the struct_mutex,
> > our hold is quite short and so we are typically just a victim of
> > contention rather than a cause.
> > 
> 
> The commit message seems little bit disconnect with the code, making
> the patch sound much more complex than it is. Is it up to date? Or
> maybe parts of this explanation would belong to an earlier patch?

"""
With a bit of care (and leniency) we can iterate over the object and
wait for previous rendering to complete with judicial use of atomic
reference counting. The ABI requires us to ensure that an active object
is eventually flushed (like the busy-ioctl) which is guaranteed by our
management of requests (i.e. everything that is submitted to hardware is
flushed in the same request). All we have to do is ensure that we can
detect when the requests are complete for reporting when the object is
idle (without triggering ETIME), locklessly - this is handled by
i915_gem_active_wait_unlocked().

The impact of this is actually quite small - the return to userspace
following the wait was already lockless and so we don't see much gain in 
latency improvement upon completing the wait. What we do achieve here is
completing an already finished wait without hitting the struct_mutex,
our hold is quite short and so we are typically just a victim of
contention rather than a cause - but it is still one less contention 
point!
"""
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre