[Intel-gfx] [PATCH 7/7] drm/i915/gem: Acquire all vma/objects under reservation_ww_class

Tue Jun 23 20:31:38 UTC 2020

On 6/23/20 8:41 PM, Chris Wilson wrote:
> Quoting Thomas Hellström (Intel) (2020-06-23 19:21:28)
>> On 6/23/20 6:36 PM, Chris Wilson wrote:
>>> Quoting Thomas Hellström (Intel) (2020-06-23 12:22:11)
>>>> Hi, Chris,
>>>>
>>>> On 6/22/20 11:59 AM, Chris Wilson wrote:
>>>>> In order to actually handle eviction and what not, we need to process
>>>>> all the objects together under a common lock, reservation_ww_class. As
>>>>> such, do a memory reservation pass after looking up the object/vma,
>>>>> which then feeds into the rest of execbuf [relocation, cmdparsing,
>>>>> flushing and ofc execution].
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>>> ---
>>>>>     .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 91 ++++++++++++++-----
>>>>>     1 file changed, 70 insertions(+), 21 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>>> index 46fcbdf8161c..8db2e013465f 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>>> @@ -53,10 +53,9 @@ struct eb_vma_array {
>>>>>     
>>>>>     #define __EXEC_OBJECT_HAS_PIN               BIT(31)
>>>>>     #define __EXEC_OBJECT_HAS_FENCE             BIT(30)
>>>>> -#define __EXEC_OBJECT_HAS_PAGES              BIT(29)
>>>>> -#define __EXEC_OBJECT_NEEDS_MAP              BIT(28)
>>>>> -#define __EXEC_OBJECT_NEEDS_BIAS     BIT(27)
>>>>> -#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 27) /* all of the above */
>>>>> +#define __EXEC_OBJECT_NEEDS_MAP              BIT(29)
>>>>> +#define __EXEC_OBJECT_NEEDS_BIAS     BIT(28)
>>>>> +#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 28) /* all of the above */
>>>>>     
>>>>>     #define __EXEC_HAS_RELOC    BIT(31)
>>>>>     #define __EXEC_INTERNAL_FLAGS       (~0u << 31)
>>>>> @@ -241,6 +240,8 @@ struct i915_execbuffer {
>>>>>         struct intel_context *context; /* logical state for the request */
>>>>>         struct i915_gem_context *gem_context; /** caller's context */
>>>>>     
>>>>> +     struct dma_fence *mm_fence;
>>>>> +
>>>>>         struct i915_request *request; /** our request to build */
>>>>>         struct eb_vma *batch; /** identity of the batch obj/vma */
>>>>>         struct i915_vma *trampoline; /** trampoline used for chaining */
>>>>> @@ -331,12 +332,7 @@ static inline void eb_unreserve_vma(struct eb_vma *ev)
>>>>>         if (ev->flags & __EXEC_OBJECT_HAS_PIN)
>>>>>                 __i915_vma_unpin(vma);
>>>>>     
>>>>> -     if (ev->flags & __EXEC_OBJECT_HAS_PAGES)
>>>>> -             i915_gem_object_unpin_pages(vma->obj);
>>>>> -
>>>>> -     ev->flags &= ~(__EXEC_OBJECT_HAS_PIN |
>>>>> -                    __EXEC_OBJECT_HAS_FENCE |
>>>>> -                    __EXEC_OBJECT_HAS_PAGES);
>>>>> +     ev->flags &= ~(__EXEC_OBJECT_HAS_PIN | __EXEC_OBJECT_HAS_FENCE);
>>>>>     }
>>>>>     
>>>>>     static void eb_vma_array_destroy(struct kref *kref)
>>>>> @@ -667,6 +663,55 @@ eb_add_vma(struct i915_execbuffer *eb,
>>>>>         list_add_tail(&ev->lock_link, &eb->lock);
>>>>>     }
>>>>>     
>>>>> +static int eb_vma_get_pages(struct i915_execbuffer *eb,
>>>>> +                         struct eb_vma *ev,
>>>>> +                         u64 idx)
>>>>> +{
>>>>> +     struct i915_vma *vma = ev->vma;
>>>>> +     int err;
>>>>> +
>>>>> +     /* XXX also preallocate PD for vma */
>>>>> +
>>>>> +     err = ____i915_gem_object_get_pages_async(vma->obj);
>>>>> +     if (err)
>>>>> +             return err;
>>>>> +
>>>>> +     return i915_active_ref(&vma->obj->mm.active, idx, eb->mm_fence);
>>>>> +}
>>>>> +
>>>>> +static int eb_reserve_mm(struct i915_execbuffer *eb)
>>>>> +{
>>>>> +     const u64 idx = eb->context->timeline->fence_context;
>>>>> +     struct ww_acquire_ctx acquire;
>>>>> +     struct eb_vma *ev;
>>>>> +     int err;
>>>>> +
>>>>> +     eb->mm_fence = __dma_fence_create_proxy(0, 0);
>>>>> +     if (!eb->mm_fence)
>>>>> +             return -ENOMEM;
>>>> Question: eb is local to this thread, right, so eb->mm_fence is not
>>>> considered "published" yet?
>>>>
>>>>> +
>>>>> +     ww_acquire_init(&acquire, &reservation_ww_class);
>>>>> +
>>>>> +     err = eb_lock_vma(eb, &acquire);
>>>>> +     if (err)
>>>>> +             goto out;
>>>>> +
>>>>> +     ww_acquire_done(&acquire);
>>>>> +
>>>>> +     list_for_each_entry(ev, &eb->lock, lock_link) {
>>>>> +             struct i915_vma *vma = ev->vma;
>>>>> +
>>>>> +             if (err == 0)
>>>>> +                     err = eb_vma_get_pages(eb, ev, idx);
>>>> I figure this is where you publish the proxy fence? If so, the fence
>>>> signaling critical path starts with this loop,
>>> Hmm, actually at this moment, the fence is still very much internal
>>> being only used as a reference token,
>> I think as long as another thread, running in this driver or another gpu
>> driver can theoretically reference the fence pointer from the
>> reservation object and wait for the fence it's considered published.
> It's not in the reservation object.
>   
>> Also the ww_mutexes in this context are really all about grabbing a
>> random set of resources and associate them with a point in a timeline,
>> as the ww_mutexes are released, the fence pointer(s) need to point to
>> published fence(s).
> That's not the purpose of these fences, though. They exist to provide
> reference counting on the backing store, along side the migration fence.
> It's extra detail tacked on the equivalent of bo->moving.
>
> That is not to say that one could build up an async migration chain which
> form a graph back to these, that chain could only be formed once the
> operation itself has been published in the dma_resv though.

Hmm. So let's say another thread grabs one of the just released 
ww_mutexes and wants to schedule a blit from one of the buffers in the 
current operation with high priority. How would that thread know how to 
order that blit operation w r t the current operation?

/Thomas

> -Chris