[Intel-gfx] [PATCH 08/66] drm/i915: Make the stale cached active node available for any timeline

Wed Jul 29 12:40:38 UTC 2020

On 28/07/2020 15:28, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-07-17 14:04:58)
>>
>> On 15/07/2020 12:50, Chris Wilson wrote:
>>> Rather than require the next timeline after idling to match the MRU
>>> before idling, reset the index on the node and allow it to match the
>>> first request. However, this requires cmpxchg(u64) and so is not trivial
>>> on 32b, so for compatibility we just fallback to keeping the cached node
>>> pointing to the MRU timeline.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/i915_active.c | 21 +++++++++++++++++++--
>>>    1 file changed, 19 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
>>> index 0854b1552bc1..6737b5615c0c 100644
>>> --- a/drivers/gpu/drm/i915/i915_active.c
>>> +++ b/drivers/gpu/drm/i915/i915_active.c
>>> @@ -157,6 +157,10 @@ __active_retire(struct i915_active *ref)
>>>                rb_link_node(&ref->cache->node, NULL, &ref->tree.rb_node);
>>>                rb_insert_color(&ref->cache->node, &ref->tree);
>>>                GEM_BUG_ON(ref->tree.rb_node != &ref->cache->node);
>>> +
>>> +             /* Make the cached node available for reuse with any timeline */
>>> +             if (IS_ENABLED(CONFIG_64BIT))
>>> +                     ref->cache->timeline = 0; /* needs cmpxchg(u64) */
>>
>> Or when fence context wraps shock horror.
> 
> I more concerned about that we use timeline:0 as a special unordered
> timeline. It's reserved by use in the dma_fence_stub, and everything
> will start to break when the timelines wrap. The earliest causalities
> will be the kernel_context timelines which are also very special indices
> for the barriers.
> 
>>
>>>        }
>>>    
>>>        spin_unlock_irqrestore(&ref->tree_lock, flags);
>>> @@ -235,9 +239,22 @@ static struct active_node *__active_lookup(struct i915_active *ref, u64 idx)
>>>    {
>>>        struct active_node *it;
>>>    
>>> +     GEM_BUG_ON(idx == 0); /* 0 is the unordered timeline, rsvd for cache */
>>> +
>>>        it = READ_ONCE(ref->cache);
>>> -     if (it && it->timeline == idx)
>>> -             return it;
>>> +     if (it) {
>>> +             u64 cached = READ_ONCE(it->timeline);
>>> +
>>> +             if (cached == idx)
>>> +                     return it;
>>> +
>>> +#ifdef CONFIG_64BIT /* for cmpxchg(u64) */
>>> +             if (!cached && !cmpxchg(&it->timeline, 0, idx)) {
>>> +                     GEM_BUG_ON(i915_active_fence_isset(&it->base));
>>> +                     return it;
>>
>> cpmxchg suggests this needs to be atomic, however above the check for
>> equality comes from a separate read.
> 
> That's fine, and quite common to avoid cmpxchg if the current value
> already does not match the expected condition.

How? What is another thread is about to install its idx into 
it->timeline with cmpxchg and this thread does not see it because it 
just returned on the "cached == idx" condition.

> 
>> Since there is a lookup code path under the spinlock, perhaps the
>> unlocked lookup could just fail, and then locked lookup could re-assign
>> the timeline without the need for cmpxchg?
> 
> The unlocked/locked lookup are the same routine. You pointed that out
> :-p

Like I remember from ten days ago.. Anyway, I am pointing out it still 
doesn't smell right.

__active_lookup(...) -> lockless
{
...
	it = fetch_node(ref->tree.rb_node);
	while (it) {
		if (it->timeline < idx) {
			it = fetch_node(it->node.rb_right);
		} else if (it->timeline > idx) {
			it = fetch_node(it->node.rb_left);
		} else {
			WRITE_ONCE(ref->cache, it);
			break;
		}
	}
...
}

Then in active_instance, locked:

...
	parent = NULL;
	p = &ref->tree.rb_node;
	while (*p) {
		parent = *p;

		node = rb_entry(parent, struct active_node, node);
		if (node->timeline == idx) {
			kmem_cache_free(global.slab_cache, prealloc);
			goto out;
		}

		if (node->timeline < idx)
			p = &parent->rb_right;
		else
			p = &parent->rb_left;
			WRITE_ONCE(ref->cache, it);
			break;
		}
	}
...

Tree walk could be consolidated between the two.

Regards,

Tvrtko