[Intel-gfx] [PATCH 08/66] drm/i915: Make the stale cached active node available for any timeline
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Wed Jul 29 12:40:38 UTC 2020
On 28/07/2020 15:28, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-07-17 14:04:58)
>>
>> On 15/07/2020 12:50, Chris Wilson wrote:
>>> Rather than require the next timeline after idling to match the MRU
>>> before idling, reset the index on the node and allow it to match the
>>> first request. However, this requires cmpxchg(u64) and so is not trivial
>>> on 32b, so for compatibility we just fallback to keeping the cached node
>>> pointing to the MRU timeline.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> ---
>>> drivers/gpu/drm/i915/i915_active.c | 21 +++++++++++++++++++--
>>> 1 file changed, 19 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
>>> index 0854b1552bc1..6737b5615c0c 100644
>>> --- a/drivers/gpu/drm/i915/i915_active.c
>>> +++ b/drivers/gpu/drm/i915/i915_active.c
>>> @@ -157,6 +157,10 @@ __active_retire(struct i915_active *ref)
>>> rb_link_node(&ref->cache->node, NULL, &ref->tree.rb_node);
>>> rb_insert_color(&ref->cache->node, &ref->tree);
>>> GEM_BUG_ON(ref->tree.rb_node != &ref->cache->node);
>>> +
>>> + /* Make the cached node available for reuse with any timeline */
>>> + if (IS_ENABLED(CONFIG_64BIT))
>>> + ref->cache->timeline = 0; /* needs cmpxchg(u64) */
>>
>> Or when fence context wraps shock horror.
>
> I more concerned about that we use timeline:0 as a special unordered
> timeline. It's reserved by use in the dma_fence_stub, and everything
> will start to break when the timelines wrap. The earliest causalities
> will be the kernel_context timelines which are also very special indices
> for the barriers.
>
>>
>>> }
>>>
>>> spin_unlock_irqrestore(&ref->tree_lock, flags);
>>> @@ -235,9 +239,22 @@ static struct active_node *__active_lookup(struct i915_active *ref, u64 idx)
>>> {
>>> struct active_node *it;
>>>
>>> + GEM_BUG_ON(idx == 0); /* 0 is the unordered timeline, rsvd for cache */
>>> +
>>> it = READ_ONCE(ref->cache);
>>> - if (it && it->timeline == idx)
>>> - return it;
>>> + if (it) {
>>> + u64 cached = READ_ONCE(it->timeline);
>>> +
>>> + if (cached == idx)
>>> + return it;
>>> +
>>> +#ifdef CONFIG_64BIT /* for cmpxchg(u64) */
>>> + if (!cached && !cmpxchg(&it->timeline, 0, idx)) {
>>> + GEM_BUG_ON(i915_active_fence_isset(&it->base));
>>> + return it;
>>
>> cpmxchg suggests this needs to be atomic, however above the check for
>> equality comes from a separate read.
>
> That's fine, and quite common to avoid cmpxchg if the current value
> already does not match the expected condition.
How? What is another thread is about to install its idx into
it->timeline with cmpxchg and this thread does not see it because it
just returned on the "cached == idx" condition.
>
>> Since there is a lookup code path under the spinlock, perhaps the
>> unlocked lookup could just fail, and then locked lookup could re-assign
>> the timeline without the need for cmpxchg?
>
> The unlocked/locked lookup are the same routine. You pointed that out
> :-p
Like I remember from ten days ago.. Anyway, I am pointing out it still
doesn't smell right.
__active_lookup(...) -> lockless
{
...
it = fetch_node(ref->tree.rb_node);
while (it) {
if (it->timeline < idx) {
it = fetch_node(it->node.rb_right);
} else if (it->timeline > idx) {
it = fetch_node(it->node.rb_left);
} else {
WRITE_ONCE(ref->cache, it);
break;
}
}
...
}
Then in active_instance, locked:
...
parent = NULL;
p = &ref->tree.rb_node;
while (*p) {
parent = *p;
node = rb_entry(parent, struct active_node, node);
if (node->timeline == idx) {
kmem_cache_free(global.slab_cache, prealloc);
goto out;
}
if (node->timeline < idx)
p = &parent->rb_right;
else
p = &parent->rb_left;
WRITE_ONCE(ref->cache, it);
break;
}
}
...
Tree walk could be consolidated between the two.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list