[Intel-gfx] [PATCH 43/46] drm/i915: Allocate a status page for each timeline
John Harrison
John.C.Harrison at Intel.com
Tue Jan 15 18:17:21 UTC 2019
On 1/15/2019 01:50, Chris Wilson wrote:
> Quoting John Harrison (2019-01-15 00:56:13)
>> On 1/7/2019 03:55, Chris Wilson wrote:
>>> +static int alloc_hwsp(struct i915_timeline *timeline)
>>> +{
>>> + struct drm_i915_private *i915 = timeline->i915;
>>> + struct i915_vma *vma;
>>> + int offset;
>>> +
>>> + mutex_lock(&i915->gt.timeline_lock);
>>> +
>>> +restart:
>>> + offset = find_first_cacheline(i915);
>>> + if (offset == NBITS && i915->gt.timeline_hwsp) {
>>> + i915_vma_put(i915->gt.timeline_hwsp);
>>> + i915->gt.timeline_hwsp = NULL;
>>> + }
>>> +
>>> + vma = i915->gt.timeline_hwsp;
>>> + if (!vma) {
>>> + struct drm_i915_gem_object *bo;
>>> +
>>> + /* Drop the lock before allocations */
>>> + mutex_unlock(&i915->gt.timeline_lock);
>>> +
>>> + BUILD_BUG_ON(NBITS * CACHELINE_BYTES > PAGE_SIZE);
>>> + bo = i915_gem_object_create_internal(i915, PAGE_SIZE);
>>> + if (IS_ERR(bo))
>>> + return PTR_ERR(bo);
>>> +
>>> + i915_gem_object_set_cache_level(bo, I915_CACHE_LLC);
>>> +
>>> + vma = i915_vma_instance(bo, &i915->ggtt.vm, NULL);
>>> + if (IS_ERR(vma))
>>> + return PTR_ERR(vma);
>>> +
>>> + mutex_lock(&i915->gt.timeline_lock);
>>> + if (i915->gt.timeline_hwsp) {
>>> + i915_gem_object_put(bo);
>>> + goto restart;
>>> + }
>>> +
>>> + i915->gt.timeline_hwsp = vma;
>>> + i915->gt.timeline_free = ~0ull;
>>> + offset = 0;
>>> + }
>>> +
>>> + i915->gt.timeline_free &= ~BIT_ULL(offset);
>>> +
>>> + timeline->hwsp_ggtt = i915_vma_get(vma);
>>> + timeline->hwsp_offset = offset * CACHELINE_BYTES;
>>> +
>>> + mutex_unlock(&i915->gt.timeline_lock);
>>> +
>>> + return 0;
>>> +}
>> If I'm reading this correctly then gt.timeline_hwsp/free is the a cached
>> copy of the most recently allocated but not yet filled bank of seqno
>> locations. When it gets full, the i915->gt reference gets dropped and a
>> new page is allocated and used up line by line. Meanwhile, each timeline
>> has it's own private reference to the page so dropping the i915->gt
>> reference is safe. And once the last timeline using a given page is
>> freed, the last reference to that page will be dropped and so the page
>> itself will also be freed. If a timeline is freed before the currently
>> cached page is filled, then that timeline's slot will be released and
>> re-used by the next timeline to be created.
>>
>> But what about the scenario of a long running system with a small but
>> growing number of persistent tasks interspersed with many short lived
>> tasks? In that case, you would end up with many sparsely populated pages
>> that whose free slots will not get re-used. You could have a linked list
>> of cached pages. When a page is filled, move it to a 'full' list. When a
>> timeline is freed, if it's page was on the 'full' list, clear the slot
>> and move it back to the 'available' list.
> Yes. My thinking was a plain slab cache was a quick-and-dirty
> improvement over a page-per-timeline. And a freelist would be the next
> step.
>
>> Or is the idea that a worst case of a single page vma allocation per
>> timeline is the least of our worries if there is an ever growing number
>> of timelines/contexts/users in the system?
> Nah, it was just an attempt to quickly reduce the number of allocations,
> where the worst case of one page+vma per timeline was the starting
> point.
>
> We should break this patch down into 1) one-page-per-timeline, 2) slab
> cache, 3) free list 4) profit.
>
> At other times we have been wanting to be able to suballocate pages,
> something to keep in mind would be extending this to arbitrary cacheline
> allocations.
The multi-stage approach sounds good. Keep things simple in this patch
and then improve the situation later. One thing to be careful of with a
cacheline allocator would be make sure whatever is being converted
wasn't using full pages for security reasons. I.e. a page can be private
to a process, a cacheline will be shared by many. I guess that would
only really apply to allocations being passed to user land as the kernel
is considered secure? Or can a user batch buffer write to arbitrary
locations within the ppHWSP and thereby splat someone else's seqno?
>>> + if (global_hwsp) {
>>> + timeline->hwsp_ggtt = i915_vma_get(global_hwsp);
>>> + timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
>>> + } else {
>>> + err = alloc_hwsp(timeline);
>>> + if (err)
>>> + return err;
>>> + }
>>> +
>>> + vaddr = i915_gem_object_pin_map(timeline->hwsp_ggtt->obj, I915_MAP_WB);
>>> + if (IS_ERR(vaddr)) { /* leak the cacheline, but will clean up later */
>> Can you explain this comment more? Where/when is the later?
> On failure here, the cacheline is still marked as allocated in the slab,
> but the reference to the page is released. So the backing page will be
> released when everyone else finally drops their reference.
>
> Just laziness, since we have the ability to return the cacheline later
> on...
Meaning the actual leak is the bit in 'i915->gt.timeline_free' that says
this cacheline can or can't be used for the next allocation? Presumably
you could do the bit map munging in the case that 'global_hwsp' is null,
but the code would certainly be messier for not a lot of gain.
>
>>> @@ -2616,7 +2628,7 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>>> goto error_deref_obj;
>>> }
>>>
>>> - timeline = i915_timeline_create(ctx->i915, ctx->name);
>>> + timeline = i915_timeline_create(ctx->i915, ctx->name, NULL);
>> Why does this use the global HWSP rather than a per context one?
> .global_hwsp = NULL => it allocates its own HWSP.
>
> Were you thinking of intel_engine_setup_common() which is still using
> the global HWSP at this point in time?
Doh. Brain fart. Presumably the engine one will disappear completely? Or
is it still needed for legacy mode?
> -Chris
More information about the Intel-gfx
mailing list