[Intel-gfx] [PATCH 08/66] drm/i915: Make the stale cached active node available for any timeline

Wed Jul 29 14:22:26 UTC 2020

On 29/07/2020 14:42, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-07-29 13:40:38)
>>
>> On 28/07/2020 15:28, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2020-07-17 14:04:58)
>>>>
>>>> On 15/07/2020 12:50, Chris Wilson wrote:
>>>>> Rather than require the next timeline after idling to match the MRU
>>>>> before idling, reset the index on the node and allow it to match the
>>>>> first request. However, this requires cmpxchg(u64) and so is not trivial
>>>>> on 32b, so for compatibility we just fallback to keeping the cached node
>>>>> pointing to the MRU timeline.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>>> ---
>>>>>     drivers/gpu/drm/i915/i915_active.c | 21 +++++++++++++++++++--
>>>>>     1 file changed, 19 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
>>>>> index 0854b1552bc1..6737b5615c0c 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_active.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_active.c
>>>>> @@ -157,6 +157,10 @@ __active_retire(struct i915_active *ref)
>>>>>                 rb_link_node(&ref->cache->node, NULL, &ref->tree.rb_node);
>>>>>                 rb_insert_color(&ref->cache->node, &ref->tree);
>>>>>                 GEM_BUG_ON(ref->tree.rb_node != &ref->cache->node);
>>>>> +
>>>>> +             /* Make the cached node available for reuse with any timeline */
>>>>> +             if (IS_ENABLED(CONFIG_64BIT))
>>>>> +                     ref->cache->timeline = 0; /* needs cmpxchg(u64) */
>>>>
>>>> Or when fence context wraps shock horror.
>>>
>>> I more concerned about that we use timeline:0 as a special unordered
>>> timeline. It's reserved by use in the dma_fence_stub, and everything
>>> will start to break when the timelines wrap. The earliest causalities
>>> will be the kernel_context timelines which are also very special indices
>>> for the barriers.
>>>
>>>>
>>>>>         }
>>>>>     
>>>>>         spin_unlock_irqrestore(&ref->tree_lock, flags);
>>>>> @@ -235,9 +239,22 @@ static struct active_node *__active_lookup(struct i915_active *ref, u64 idx)
>>>>>     {
>>>>>         struct active_node *it;
>>>>>     
>>>>> +     GEM_BUG_ON(idx == 0); /* 0 is the unordered timeline, rsvd for cache */
>>>>> +
>>>>>         it = READ_ONCE(ref->cache);
>>>>> -     if (it && it->timeline == idx)
>>>>> -             return it;
>>>>> +     if (it) {
>>>>> +             u64 cached = READ_ONCE(it->timeline);
>>>>> +
>>>>> +             if (cached == idx)
>>>>> +                     return it;
>>>>> +
>>>>> +#ifdef CONFIG_64BIT /* for cmpxchg(u64) */
>>>>> +             if (!cached && !cmpxchg(&it->timeline, 0, idx)) {
>>>>> +                     GEM_BUG_ON(i915_active_fence_isset(&it->base));
>>>>> +                     return it;
>>>>
>>>> cpmxchg suggests this needs to be atomic, however above the check for
>>>> equality comes from a separate read.
>>>
>>> That's fine, and quite common to avoid cmpxchg if the current value
>>> already does not match the expected condition.
>>
>> How? What is another thread is about to install its idx into
>> it->timeline with cmpxchg and this thread does not see it because it
>> just returned on the "cached == idx" condition.
> 
> Because it's nonzero.
> 
> If the idx is already non-zero, it will always remain non-zero until
> everybody idles (and there are no more threads).
> 
> If the idx is zero, it can only transition to non-zero once, atomically
> via cmpxchg. The first and only first cmpxchg will return that the
> previous value was 0, and so return with it->idx == idx.

I think this is worthy of a comment to avoid future reader having to 
re-figure it all out.

>>>> Since there is a lookup code path under the spinlock, perhaps the
>>>> unlocked lookup could just fail, and then locked lookup could re-assign
>>>> the timeline without the need for cmpxchg?
>>>
>>> The unlocked/locked lookup are the same routine. You pointed that out
>>> :-p
>>
>> Like I remember from ten days ago.. Anyway, I am pointing out it still
>> doesn't smell right.
>>
>> __active_lookup(...) -> lockless
>> {
>> ...
>>          it = fetch_node(ref->tree.rb_node);
>>          while (it) {
>>                  if (it->timeline < idx) {
>>                          it = fetch_node(it->node.rb_right);
>>                  } else if (it->timeline > idx) {
>>                          it = fetch_node(it->node.rb_left);
>>                  } else {
>>                          WRITE_ONCE(ref->cache, it);
>>                          break;
>>                  }
>>          }
>> ...
>> }
>>
>> Then in active_instance, locked:
>>
>> ...
>>          parent = NULL;
>>          p = &ref->tree.rb_node;
>>          while (*p) {
>>                  parent = *p;
>>
>>                  node = rb_entry(parent, struct active_node, node);
>>                  if (node->timeline == idx) {
>>                          kmem_cache_free(global.slab_cache, prealloc);
>>                          goto out;
>>                  }
>>
>>                  if (node->timeline < idx)
>>                          p = &parent->rb_right;
>>                  else
>>                          p = &parent->rb_left;
>>                          WRITE_ONCE(ref->cache, it);
>>                          break;
>>                  }
>>          }
>> ...
>>
>> Tree walk could be consolidated between the two.
> 
> This tree walk is subtly different, as we aren't just interested in the
> node, but its parent. The exact repetitions have been consolidated into
> __active_lookup.

It returns the previous/parent node if idx is not found so yeah, common 
helper would need to have two out parameters. One returns the match, or 
NULL, another returns the previous/parent node. You think that is not 
worth it?

Regards,

Tvrtko