[Intel-gfx] [PATCH 21/22] drm/i915: Enable lockless lookup of request tracking via RCU

Fri Jul 29 08:41:14 UTC 2016

On Thu, Jul 28, 2016 at 09:49:58PM +0100, Chris Wilson wrote:
> On Thu, Jul 28, 2016 at 12:23:40PM +0200, Daniel Vetter wrote:
> > I think we have a race here still: The issue is that the
> > kref_get_unless_zero is an unordered atomic, and the rcu_dereference is
> > only an smb_read_barrier_depends, which doesn't prevent the fetch from
> > happening before the atomic_add_unless.
> > 
> > Well until I opened memory-barriers.txt and learned that atomic_add_unless
> > is a full smp_mb() on both sides on success. That's a bit too tricky for
> > my taste, what about the following comment:
> > 
> > 		/* When request_get_rcu succeds the underlying
> > 		 * atomic_add_unless has a full smp_mb() on both sides.
> > 		 * This ensures that the rcu_dereference() below can't be
> > 		 * reordered before the the refcounting increase has
> > 		 * happened, which prevents the request from being reused.
> > 		 */
> > 
> > I couldn't poke any other holes into this, and we're reusing the fence rcu
> > functions where appropriate. With the comment:
> 

I guess it doesn't hurt to make this really, really clear. Perfect! Well
almost, one nit:

> 
>                 /* What stops the following rcu_dereference() from occuring
>                  * before the above i915_gem_request_get_rcu()? If we were
>                  * to read the value before pausing to get the reference to
>                  * the request, we may not notice a change in the active
>                  * tracker.
>                  *
>                  * The rcu_dereference() is a mere read barrier, which means

s/read barrier/barrier of depending reads/, rcu_dereference is not even a
full rmb!

>                  * that operations after it will appear after, neither the

hence also: s/operations/any operations through the read pointer/

Aside: I'm always impressive how alpha managed to mispredict dependent
reads somehow ... "sorry my magic 8ball had a glitch"!?
-Daniel

>                  * CPU nor the compiler will bring them forwards. However,
>                  * that does not restrict the rcu_dereference() itself. The
>                  * read may be performed earlier by an out-of-order CPU, or
> 		 * adventurous compiler.
>                  *
>                  * The atomic operation at the heart of
>                  * i915_gem_request_get_rcu(), see fence_get_rcu(), is
>                  * atomic_inc_not_zero() which is only a full memory barrier
>                  * when succesful. That is, if i915_gem_request_get_rcu()
>                  * returns the request (and so with the reference counted
>                  * incremented) then the following read for rcu_dereference()
>                  * must occur after the atomic operation and so confirm
>                  * that this request is the one currently being tracked.
>                  */
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch