[Intel-gfx] [PATCH 08/66] drm/i915: Make the stale cached active node available for any timeline

Wed Jul 29 13:53:27 UTC 2020

Quoting Chris Wilson (2020-07-29 14:42:06)
> Quoting Tvrtko Ursulin (2020-07-29 13:40:38)
> > 
> > On 28/07/2020 15:28, Chris Wilson wrote:
> > > Quoting Tvrtko Ursulin (2020-07-17 14:04:58)
> > >>
> > >> On 15/07/2020 12:50, Chris Wilson wrote:
> > >>> Rather than require the next timeline after idling to match the MRU
> > >>> before idling, reset the index on the node and allow it to match the
> > >>> first request. However, this requires cmpxchg(u64) and so is not trivial
> > >>> on 32b, so for compatibility we just fallback to keeping the cached node
> > >>> pointing to the MRU timeline.
> > >>>
> > >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > >>> ---
> > >>>    drivers/gpu/drm/i915/i915_active.c | 21 +++++++++++++++++++--
> > >>>    1 file changed, 19 insertions(+), 2 deletions(-)
> > >>>
> > >>> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> > >>> index 0854b1552bc1..6737b5615c0c 100644
> > >>> --- a/drivers/gpu/drm/i915/i915_active.c
> > >>> +++ b/drivers/gpu/drm/i915/i915_active.c
> > >>> @@ -157,6 +157,10 @@ __active_retire(struct i915_active *ref)
> > >>>                rb_link_node(&ref->cache->node, NULL, &ref->tree.rb_node);
> > >>>                rb_insert_color(&ref->cache->node, &ref->tree);
> > >>>                GEM_BUG_ON(ref->tree.rb_node != &ref->cache->node);
> > >>> +
> > >>> +             /* Make the cached node available for reuse with any timeline */
> > >>> +             if (IS_ENABLED(CONFIG_64BIT))
> > >>> +                     ref->cache->timeline = 0; /* needs cmpxchg(u64) */
> > >>
> > >> Or when fence context wraps shock horror.
> > > 
> > > I more concerned about that we use timeline:0 as a special unordered
> > > timeline. It's reserved by use in the dma_fence_stub, and everything
> > > will start to break when the timelines wrap. The earliest causalities
> > > will be the kernel_context timelines which are also very special indices
> > > for the barriers.
> > > 
> > >>
> > >>>        }
> > >>>    
> > >>>        spin_unlock_irqrestore(&ref->tree_lock, flags);
> > >>> @@ -235,9 +239,22 @@ static struct active_node *__active_lookup(struct i915_active *ref, u64 idx)
> > >>>    {
> > >>>        struct active_node *it;
> > >>>    
> > >>> +     GEM_BUG_ON(idx == 0); /* 0 is the unordered timeline, rsvd for cache */
> > >>> +
> > >>>        it = READ_ONCE(ref->cache);
> > >>> -     if (it && it->timeline == idx)
> > >>> -             return it;
> > >>> +     if (it) {
> > >>> +             u64 cached = READ_ONCE(it->timeline);
> > >>> +
> > >>> +             if (cached == idx)
> > >>> +                     return it;
> > >>> +
> > >>> +#ifdef CONFIG_64BIT /* for cmpxchg(u64) */
> > >>> +             if (!cached && !cmpxchg(&it->timeline, 0, idx)) {
> > >>> +                     GEM_BUG_ON(i915_active_fence_isset(&it->base));
> > >>> +                     return it;
> > >>
> > >> cpmxchg suggests this needs to be atomic, however above the check for
> > >> equality comes from a separate read.
> > > 
> > > That's fine, and quite common to avoid cmpxchg if the current value
> > > already does not match the expected condition.
> > 
> > How? What is another thread is about to install its idx into 
> > it->timeline with cmpxchg and this thread does not see it because it 
> > just returned on the "cached == idx" condition.
> 
> Because it's nonzero.
> 
> If the idx is already non-zero, it will always remain non-zero until
> everybody idles (and there are no more threads).
> 
> If the idx is zero, it can only transition to non-zero once, atomically
> via cmpxchg. The first and only first cmpxchg will return that the
> previous value was 0, and so return with it->idx == idx.

As for the case that two threads are attempting to install 2 different
fences into the same timeline slot -- that concurrency is controlled
by the timeline mutex, or some other agreed upon serialisation for the
slot [e.g. the exclusive slot doesn't have an intel_timeline associated
with it, and some ranges uses mutexes other than intel_timeline.]
-Chris