[Intel-gfx] [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance

Wed Aug 10 10:12:37 UTC 2016

On Tue, Aug 09, 2016 at 10:05:30AM +0100, Chris Wilson wrote:
> On Tue, Aug 09, 2016 at 11:48:56AM +0300, Joonas Lahtinen wrote:
> > On ti, 2016-08-09 at 08:14 +0100, Chris Wilson wrote:
> > > On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote:
> > > > 
> > > > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote:
> > > > > 
> > > > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> > > > > > 
> > > > > > 
> > > > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > In the debate as to whether the second read of active->request is
> > > > > > > > ordered after the dependent reads of the first read of active->request,
> > > > > > > > just give in and throw a smp_rmb() in there so that ordering of loads is
> > > > > > > > assured.
> > > > > > > > 
> > > > > > > > v2: Explain the manual smp_rmb()
> > > > > > > > 
> > > > > > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > > > > > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > > > > > > > Reviewed-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> > > > > > > r-b confirmed.
> > > > > > It's still fishy that we are implying an SMP effect where we need to
> > > > > > mandate the local processor order (that being the order evaluation of
> > > > > > request = *active; engine = *request; *active). The two *active are
> > > > > > already ordered across SMP, so we are only concered about this cpu. :|
> > > > > More second thoughts. rcu_assign_pointer(NULL) is not visible to
> > > > > rcu_access_pointer on another CPU without the smp_rmb. 
> > > > Should not a RCU read side lock be involved?
> > > Yes, we use rcu read lock here. The question here is about visibility of
> > > the other processor writes vs the local processor order. Before the
> > > other processor can overwrite the request during reallocation, it will
> > > have updated the active->request and gone through a wmb. During busy
> > > ioctl's read of the request, we want to make sure that the values we
> > > read (request->engine, request->seqno) have not been overwritten as we
> > > do so - and we do that by serialising the second pointer check with the
> > > other cpus.
> > 
> > As discussed in IRC, some other mechanism than an improvised spinning
> > loop + some SMP barriers thrown around would be much preferred.
> > 
> > You suggested a seqlock, and it would likely be ok.
> 
> I was comparing the read latching as they are identical. Using a
> read/write seqlock around the request modification does not prevent all
> dangers such as using kzalloc() and introduces a second sequence counter
> to the one we already have. And for good reason seqlock says to use RCU
> here. Which puts us in a bit of a catch-22 and having to guard against
> SLAB_DESTROY_BY_RCU.

Since I was part of that irc party too: I think this is perfectly fine
as-is. Essentially what we do here is plain rcu+kref_get_unless_zero to
protect against zombies. This is exactly the design fence_get_rcu is meant
to be used in. But for this fastpath we can avoid even the zombie
protection if we're careful enough. Trying to make this less scary by
using a seqlock instead of plain rcu would run counter to the overall
fence_get_rcu design. And since the goal is to share fences widely and
far, we don't want to build up slightly different locking scheme here
where fence pointers are not really protected by rcu.

Given all that I think the open-coded scary dance (with lots of comments)
is acceptable, also since this clearly is a fastpath that userspace loves
to beat on.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch