[Intel-gfx] [PATCH] drm/i915: Enforce TYPESAFE_BY_RCU vs refcount mb on reinitialisation

Mon Aug 6 11:55:12 UTC 2018

Chris Wilson <chris at chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2018-08-06 12:12:15)
>> Chris Wilson <chris at chris-wilson.co.uk> writes:
>> 
>> > By using TYPESAFE_BY_RCU, we accept that requests may be swapped out from
>> > underneath us, even when using rcu_read_lock(). We use a strong barrier
>> > on acquiring the refcount during lookup, but this needs to be paired
>> > with a barrier on re-initialising it. Currently we call dma_fence_init,
>> > which ultimately does a plain atomic_set(1) on the refcount, not
>> > providing any memory barriers. As we inspect some state before even
>> > acquiring the refcount in the lookup (by arguing that we can detect
>> > inconsistent requests), that state should be initialised before the
>> > refcount.
>> >
>> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> > ---
>> >  drivers/gpu/drm/i915/i915_request.c | 7 +++++++
>> >  1 file changed, 7 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>> > index 5c2c93cbab12..04a0b8e75533 100644
>> > --- a/drivers/gpu/drm/i915/i915_request.c
>> > +++ b/drivers/gpu/drm/i915/i915_request.c
>> > @@ -768,6 +768,13 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>> >       rq->timeline = ce->ring->timeline;
>> >       GEM_BUG_ON(rq->timeline == &engine->timeline);
>> >  
>> > +     /*
>> > +      * In order to coordinate with our RCU lookup,
>> > +      * __i915_gem_active_get_rcu(), we need to ensure that the change
>> > +      * to rq->engine is visible before acquring the refcount in the lookup.
>> > +      */
>> > +     smp_wmb();
>> > +
>> 
>> There is quite a lot going on here as we try to get a reference
>> into a shapeshifting request.
>> 
>> By looking at the code acquiring it, dma_fence_get_rcu
>> and dma_fence_init and then the precheck of the request,
>> should memory barrier be:
>> 
>> smb_mb_before_atomic()?
>
> No. The code does have a mb, smb_mb_before_atomic is only for atomics
> that don't themselves enforce a mb and so you need a bit of extra
> weight. On x86, it's not even a mb, just a compiler barrier.
>
>> Admittedly that would be uglier as fence_init hides the atomic_set,
>> but it is atomic on we are serializing. Especially
>> as there is no atomic in callsight.
>
> Right, the suggestion in the thread was to use atomic_set_release(), but
> that requires a lot of deconstruction merely to do the same: it adds
> smp_mb() before the atomic_set.
>
>> Further, as engine and the kref are tightly bound,
>> should we initialize everything not related first, then
>> do engine init, wmb, fence init in a tight proximity?
>
> As we do. The existing order is sufficient for our needs. Everything
> that needs to be initialised before the kref, is -- though I think it's
> overkill as our argument about checking stale state is still correct and
> safe. So what this nails down is the stability of a full referenced
> request -- which is less worrisome as it will only be exposed to the rcu
> onlookers much later, we don't have the same danger of immediate
> exposure to rcu walkers.
>
> What I do think is useful overall is that it gives the companion mb to
> the one referenced by __i915_gem_active_get_rcu, and dma_fence_get_rcu
> generally.

Agreed.

I tried to think how to improve the comment pairing, but
the rabbit hole is deep in here. Mentioning the refcount
should guide the reader into right spots tho.

Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>

> -Chris