[Intel-gfx] [PATCH 08/66] drm/i915: Make the stale cached active node available for any timeline

Wed Jul 29 13:42:06 UTC 2020

Quoting Tvrtko Ursulin (2020-07-29 13:40:38)
> 
> On 28/07/2020 15:28, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-07-17 14:04:58)
> >>
> >> On 15/07/2020 12:50, Chris Wilson wrote:
> >>> Rather than require the next timeline after idling to match the MRU
> >>> before idling, reset the index on the node and allow it to match the
> >>> first request. However, this requires cmpxchg(u64) and so is not trivial
> >>> on 32b, so for compatibility we just fallback to keeping the cached node
> >>> pointing to the MRU timeline.
> >>>
> >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> ---
> >>>    drivers/gpu/drm/i915/i915_active.c | 21 +++++++++++++++++++--
> >>>    1 file changed, 19 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> >>> index 0854b1552bc1..6737b5615c0c 100644
> >>> --- a/drivers/gpu/drm/i915/i915_active.c
> >>> +++ b/drivers/gpu/drm/i915/i915_active.c
> >>> @@ -157,6 +157,10 @@ __active_retire(struct i915_active *ref)
> >>>                rb_link_node(&ref->cache->node, NULL, &ref->tree.rb_node);
> >>>                rb_insert_color(&ref->cache->node, &ref->tree);
> >>>                GEM_BUG_ON(ref->tree.rb_node != &ref->cache->node);
> >>> +
> >>> +             /* Make the cached node available for reuse with any timeline */
> >>> +             if (IS_ENABLED(CONFIG_64BIT))
> >>> +                     ref->cache->timeline = 0; /* needs cmpxchg(u64) */
> >>
> >> Or when fence context wraps shock horror.
> > 
> > I more concerned about that we use timeline:0 as a special unordered
> > timeline. It's reserved by use in the dma_fence_stub, and everything
> > will start to break when the timelines wrap. The earliest causalities
> > will be the kernel_context timelines which are also very special indices
> > for the barriers.
> > 
> >>
> >>>        }
> >>>    
> >>>        spin_unlock_irqrestore(&ref->tree_lock, flags);
> >>> @@ -235,9 +239,22 @@ static struct active_node *__active_lookup(struct i915_active *ref, u64 idx)
> >>>    {
> >>>        struct active_node *it;
> >>>    
> >>> +     GEM_BUG_ON(idx == 0); /* 0 is the unordered timeline, rsvd for cache */
> >>> +
> >>>        it = READ_ONCE(ref->cache);
> >>> -     if (it && it->timeline == idx)
> >>> -             return it;
> >>> +     if (it) {
> >>> +             u64 cached = READ_ONCE(it->timeline);
> >>> +
> >>> +             if (cached == idx)
> >>> +                     return it;
> >>> +
> >>> +#ifdef CONFIG_64BIT /* for cmpxchg(u64) */
> >>> +             if (!cached && !cmpxchg(&it->timeline, 0, idx)) {
> >>> +                     GEM_BUG_ON(i915_active_fence_isset(&it->base));
> >>> +                     return it;
> >>
> >> cpmxchg suggests this needs to be atomic, however above the check for
> >> equality comes from a separate read.
> > 
> > That's fine, and quite common to avoid cmpxchg if the current value
> > already does not match the expected condition.
> 
> How? What is another thread is about to install its idx into 
> it->timeline with cmpxchg and this thread does not see it because it 
> just returned on the "cached == idx" condition.

Because it's nonzero.

If the idx is already non-zero, it will always remain non-zero until
everybody idles (and there are no more threads).

If the idx is zero, it can only transition to non-zero once, atomically
via cmpxchg. The first and only first cmpxchg will return that the
previous value was 0, and so return with it->idx == idx.

> 
> > 
> >> Since there is a lookup code path under the spinlock, perhaps the
> >> unlocked lookup could just fail, and then locked lookup could re-assign
> >> the timeline without the need for cmpxchg?
> > 
> > The unlocked/locked lookup are the same routine. You pointed that out
> > :-p
> 
> Like I remember from ten days ago.. Anyway, I am pointing out it still 
> doesn't smell right.
> 
> __active_lookup(...) -> lockless
> {
> ...
>         it = fetch_node(ref->tree.rb_node);
>         while (it) {
>                 if (it->timeline < idx) {
>                         it = fetch_node(it->node.rb_right);
>                 } else if (it->timeline > idx) {
>                         it = fetch_node(it->node.rb_left);
>                 } else {
>                         WRITE_ONCE(ref->cache, it);
>                         break;
>                 }
>         }
> ...
> }
> 
> Then in active_instance, locked:
> 
> ...
>         parent = NULL;
>         p = &ref->tree.rb_node;
>         while (*p) {
>                 parent = *p;
> 
>                 node = rb_entry(parent, struct active_node, node);
>                 if (node->timeline == idx) {
>                         kmem_cache_free(global.slab_cache, prealloc);
>                         goto out;
>                 }
> 
>                 if (node->timeline < idx)
>                         p = &parent->rb_right;
>                 else
>                         p = &parent->rb_left;
>                         WRITE_ONCE(ref->cache, it);
>                         break;
>                 }
>         }
> ...
> 
> Tree walk could be consolidated between the two.

This tree walk is subtly different, as we aren't just interested in the
node, but its parent. The exact repetitions have been consolidated into
__active_lookup.
-Chris