[Intel-gfx] [PATCH 43/46] drm/i915: Allocate a status page for each timeline

Tue Jan 15 18:17:21 UTC 2019

On 1/15/2019 01:50, Chris Wilson wrote:
> Quoting John Harrison (2019-01-15 00:56:13)
>> On 1/7/2019 03:55, Chris Wilson wrote:
>>> +static int alloc_hwsp(struct i915_timeline *timeline)
>>> +{
>>> +     struct drm_i915_private *i915 = timeline->i915;
>>> +     struct i915_vma *vma;
>>> +     int offset;
>>> +
>>> +     mutex_lock(&i915->gt.timeline_lock);
>>> +
>>> +restart:
>>> +     offset = find_first_cacheline(i915);
>>> +     if (offset == NBITS && i915->gt.timeline_hwsp) {
>>> +             i915_vma_put(i915->gt.timeline_hwsp);
>>> +             i915->gt.timeline_hwsp = NULL;
>>> +     }
>>> +
>>> +     vma = i915->gt.timeline_hwsp;
>>> +     if (!vma) {
>>> +             struct drm_i915_gem_object *bo;
>>> +
>>> +             /* Drop the lock before allocations */
>>> +             mutex_unlock(&i915->gt.timeline_lock);
>>> +
>>> +             BUILD_BUG_ON(NBITS * CACHELINE_BYTES > PAGE_SIZE);
>>> +             bo = i915_gem_object_create_internal(i915, PAGE_SIZE);
>>> +             if (IS_ERR(bo))
>>> +                     return PTR_ERR(bo);
>>> +
>>> +             i915_gem_object_set_cache_level(bo, I915_CACHE_LLC);
>>> +
>>> +             vma = i915_vma_instance(bo, &i915->ggtt.vm, NULL);
>>> +             if (IS_ERR(vma))
>>> +                     return PTR_ERR(vma);
>>> +
>>> +             mutex_lock(&i915->gt.timeline_lock);
>>> +             if (i915->gt.timeline_hwsp) {
>>> +                     i915_gem_object_put(bo);
>>> +                     goto restart;
>>> +             }
>>> +
>>> +             i915->gt.timeline_hwsp = vma;
>>> +             i915->gt.timeline_free = ~0ull;
>>> +             offset = 0;
>>> +     }
>>> +
>>> +     i915->gt.timeline_free &= ~BIT_ULL(offset);
>>> +
>>> +     timeline->hwsp_ggtt = i915_vma_get(vma);
>>> +     timeline->hwsp_offset = offset * CACHELINE_BYTES;
>>> +
>>> +     mutex_unlock(&i915->gt.timeline_lock);
>>> +
>>> +     return 0;
>>> +}
>> If I'm reading this correctly then gt.timeline_hwsp/free is the a cached
>> copy of the most recently allocated but not yet filled bank of seqno
>> locations. When it gets full, the i915->gt reference gets dropped and a
>> new page is allocated and used up line by line. Meanwhile, each timeline
>> has it's own private reference to the page so dropping the i915->gt
>> reference is safe. And once the last timeline using a given page is
>> freed, the last reference to that page will be dropped and so the page
>> itself will also be freed. If a timeline is freed before the currently
>> cached page is filled, then that timeline's slot will be released and
>> re-used by the next timeline to be created.
>>
>> But what about the scenario of a long running system with a small but
>> growing number of persistent tasks interspersed with many short lived
>> tasks? In that case, you would end up with many sparsely populated pages
>> that whose free slots will not get re-used. You could have a linked list
>> of cached pages. When a page is filled, move it to a 'full' list. When a
>> timeline is freed, if it's page was on the 'full' list, clear the slot
>> and move it back to the 'available' list.
> Yes. My thinking was a plain slab cache was a quick-and-dirty
> improvement over a page-per-timeline. And a freelist would be the next
> step.
>
>> Or is the idea that a worst case of a single page vma allocation per
>> timeline is the least of our worries if there is an ever growing number
>> of timelines/contexts/users in the system?
> Nah, it was just an attempt to quickly reduce the number of allocations,
> where the worst case of one page+vma per timeline was the starting
> point.
>
> We should break this patch down into 1) one-page-per-timeline, 2) slab
> cache, 3) free list 4) profit.
>
> At other times we have been wanting to be able to suballocate pages,
> something to keep in mind would be extending this to arbitrary cacheline
> allocations.
The multi-stage approach sounds good. Keep things simple in this patch 
and then improve the situation later. One thing to be careful of with a 
cacheline allocator would be make sure whatever is being converted 
wasn't using full pages for security reasons. I.e. a page can be private 
to a process, a cacheline will be shared by many. I guess that would 
only really apply to allocations being passed to user land as the kernel 
is considered secure? Or can a user batch buffer write to arbitrary 
locations within the ppHWSP and thereby splat someone else's seqno?

>>> +     if (global_hwsp) {
>>> +             timeline->hwsp_ggtt = i915_vma_get(global_hwsp);
>>> +             timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
>>> +     } else {
>>> +             err = alloc_hwsp(timeline);
>>> +             if (err)
>>> +                     return err;
>>> +     }
>>> +
>>> +     vaddr = i915_gem_object_pin_map(timeline->hwsp_ggtt->obj, I915_MAP_WB);
>>> +     if (IS_ERR(vaddr)) { /* leak the cacheline, but will clean up later */
>> Can you explain this comment more? Where/when is the later?
> On failure here, the cacheline is still marked as allocated in the slab,
> but the reference to the page is released. So the backing page will be
> released when everyone else finally drops their reference.
>
> Just laziness, since we have the ability to return the cacheline later
> on...
Meaning the actual leak is the bit in 'i915->gt.timeline_free' that says 
this cacheline can or can't be used for the next allocation? Presumably 
you could do the bit map munging in the case that 'global_hwsp' is null, 
but the code would certainly be messier for not a lot of gain.

>
>>> @@ -2616,7 +2628,7 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>>>                goto error_deref_obj;
>>>        }
>>>    
>>> -     timeline = i915_timeline_create(ctx->i915, ctx->name);
>>> +     timeline = i915_timeline_create(ctx->i915, ctx->name, NULL);
>> Why does this use the global HWSP rather than a per context one?
> .global_hwsp = NULL => it allocates its own HWSP.
>
> Were you thinking of intel_engine_setup_common() which is still using
> the global HWSP at this point in time?
Doh. Brain fart. Presumably the engine one will disappear completely? Or 
is it still needed for legacy mode?


> -Chris