[Intel-gfx] [PATCH] drm/i915: Enforce TYPESAFE_BY_RCU vs refcount mb on reinitialisation

Mon Aug 6 11:41:45 UTC 2018

Quoting Mika Kuoppala (2018-08-06 12:12:15)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > By using TYPESAFE_BY_RCU, we accept that requests may be swapped out from
> > underneath us, even when using rcu_read_lock(). We use a strong barrier
> > on acquiring the refcount during lookup, but this needs to be paired
> > with a barrier on re-initialising it. Currently we call dma_fence_init,
> > which ultimately does a plain atomic_set(1) on the refcount, not
> > providing any memory barriers. As we inspect some state before even
> > acquiring the refcount in the lookup (by arguing that we can detect
> > inconsistent requests), that state should be initialised before the
> > refcount.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_request.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 5c2c93cbab12..04a0b8e75533 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -768,6 +768,13 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
> >       rq->timeline = ce->ring->timeline;
> >       GEM_BUG_ON(rq->timeline == &engine->timeline);
> >  
> > +     /*
> > +      * In order to coordinate with our RCU lookup,
> > +      * __i915_gem_active_get_rcu(), we need to ensure that the change
> > +      * to rq->engine is visible before acquring the refcount in the lookup.
> > +      */
> > +     smp_wmb();
> > +
> 
> There is quite a lot going on here as we try to get a reference
> into a shapeshifting request.
> 
> By looking at the code acquiring it, dma_fence_get_rcu
> and dma_fence_init and then the precheck of the request,
> should memory barrier be:
> 
> smb_mb_before_atomic()?

No. The code does have a mb, smb_mb_before_atomic is only for atomics
that don't themselves enforce a mb and so you need a bit of extra
weight. On x86, it's not even a mb, just a compiler barrier.

> Admittedly that would be uglier as fence_init hides the atomic_set,
> but it is atomic on we are serializing. Especially
> as there is no atomic in callsight.

Right, the suggestion in the thread was to use atomic_set_release(), but
that requires a lot of deconstruction merely to do the same: it adds
smp_mb() before the atomic_set.

> Further, as engine and the kref are tightly bound,
> should we initialize everything not related first, then
> do engine init, wmb, fence init in a tight proximity?

As we do. The existing order is sufficient for our needs. Everything
that needs to be initialised before the kref, is -- though I think it's
overkill as our argument about checking stale state is still correct and
safe. So what this nails down is the stability of a full referenced
request -- which is less worrisome as it will only be exposed to the rcu
onlookers much later, we don't have the same danger of immediate
exposure to rcu walkers.

What I do think is useful overall is that it gives the companion mb to
the one referenced by __i915_gem_active_get_rcu, and dma_fence_get_rcu
generally.
-Chris