[Intel-gfx] [PATCH 07/27] drm/i915: Squash repeated awaits on the same fence

Wed Apr 26 12:13:41 UTC 2017

On 26/04/2017 12:18, Chris Wilson wrote:
> On Wed, Apr 26, 2017 at 11:54:08AM +0100, Tvrtko Ursulin wrote:
>>>>> +/* struct intel_timeline_sync is a layer of a radixtree that maps a u64 fence
>>>>> + * context id to the last u32 fence seqno waited upon from that context.
>>>>> + * Unlike lib/radixtree it uses a parent pointer that allows traversal back to
>>>>> + * the root. This allows us to access the whole tree via a single pointer
>>>>> + * to the most recently used layer. We expect fence contexts to be dense
>>>>> + * and most reuse to be on the same i915_gem_context but on neighbouring
>>>>> + * engines (i.e. on adjacent contexts) and reuse the same leaf, a very
>>>>> + * effective lookup cache. If the new lookup is not on the same leaf, we
>>>>> + * expect it to be on the neighbouring branch.
>>>>> + *
>>>>> + * A leaf holds an array of u32 seqno, and has height 0. The bitmap field
>>>>> + * allows us to store whether a particular seqno is valid (i.e. allows us
>>>>> + * to distinguish unset from 0).
>>>>> + *
>>>>> + * A branch holds an array of layer pointers, and has height > 0, and always
>>>>> + * has at least 2 layers (either branches or leaves) below it.
>>>>> + *
>>>>> + */
>>>>
>>>> @_@ :)
>>>>
>>>> Ok, so a map of u64 to u32. We can't use IDR or radixtree directly
>>>> because of u64 keys. :( How about a hash table? It would be much
>>>> simpler to review. :) Seriously, if it would perform close enough it
>>>> would be a much much simpler implementation.
>>>
>>> You want a resizable hashtable. rht is appallingly slow, so you want a
>>> custom resizeable ht. They are not as simple as this codewise ;)
>>> (Plus a compressed radixtree is part of my plan for scalability
>>> improvements for struct reservation_object.)
>>
>> Why resizable? I was thinking a normal one. If at any given time we
>> have an active set of contexts, or at least lookups are as you say
>> below, to neighbouring contexts, that would mean we are talking
>> about lookups to different hash buckets.  And for the typical
>> working set we would expect many collisions so longer lists in each
>> bucket? So maybe NUM_ENGINES * some typical load constant number
>> buckets would not be that bad?
>
> Consider a long running display server that will accumulate 10,000s of
> thousands of clients in its lifetime, each with their own contents that
> get shared by passing around fences/framebuffers. (Or on a shorter scale
> any of the context stress tests in igt.) Due to the non-recycling of the
> context ids, we can grow to very large tables - but we have no knowledge
> of what contexts are no longer required.
>
> To compensate we need to occasionally prune the sync points. For a ht we
> could just scrap it, For an idr, we could store last use and delete
> stale leaves.

Hm, pruning yes.. but you don't have pruning in this patch either. So 
that's something which needs to be addressed either way.

> But first we have a question of how many buckets do we give the static
> ht? Most processes will be sharing between 2 contexts (render context,
> presentation context) except for a display server who may have 10-100 of
> clients - and possibly where eliminating repeated syncs is going to be
> most valuable. That suggests 256 buckets for every timeline (assuming
> just a pair of engines across shared contexts). Overkill for the
> majority, and going to be miserable in stress tests.
>
> Finally what do you store in the ht? Fences are the obvious candidate,
> but need reaping. Or you just create a ht for the request and only
> squash repeated fences inside a single request - that doesn't benefit
> from timeline tracking and squashing between requests (but does avoid
> keeping fences around forever). Hence why I went with tracking seqno. To
> avoid allocations for the ht nodes, we could create an open-addressed ht
> with the {context, seqno} embedded in it. It would be efficient, but
> needs online resizing and is a fair chunk of new code.

I was thinking of exactly the same thing as this patch does, u64 context 
id as key, u32 seqnos (wrapped in a container with hlist_node).

Ok, so the key is pruning to keep the display server scenario in check.

Free the key from i915_fence_release?

Regards,

Tvrtko