[Intel-gfx] [PATCH 7/7] drm/i915/gem: Acquire all vma/objects under reservation_ww_class
Chris Wilson
chris at chris-wilson.co.uk
Wed Jun 24 08:08:27 UTC 2020
Quoting Thomas Hellström (Intel) (2020-06-24 06:42:33)
>
> On 6/23/20 11:15 PM, Chris Wilson wrote:
> > Quoting Thomas Hellström (Intel) (2020-06-23 21:31:38)
> >> On 6/23/20 8:41 PM, Chris Wilson wrote:
> >>> Quoting Thomas Hellström (Intel) (2020-06-23 19:21:28)
> >>>> On 6/23/20 6:36 PM, Chris Wilson wrote:
> >>>>> Quoting Thomas Hellström (Intel) (2020-06-23 12:22:11)
> >>>>>> Hi, Chris,
> >>>>>>
> >>>>>> On 6/22/20 11:59 AM, Chris Wilson wrote:
> >>>>>>> In order to actually handle eviction and what not, we need to process
> >>>>>>> all the objects together under a common lock, reservation_ww_class. As
> >>>>>>> such, do a memory reservation pass after looking up the object/vma,
> >>>>>>> which then feeds into the rest of execbuf [relocation, cmdparsing,
> >>>>>>> flushing and ofc execution].
> >>>>>>>
> >>>>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>>>>>> ---
> >>>>>>> .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 91 ++++++++++++++-----
> >>>>>>> 1 file changed, 70 insertions(+), 21 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >>>>>>> index 46fcbdf8161c..8db2e013465f 100644
> >>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >>>>>>> @@ -53,10 +53,9 @@ struct eb_vma_array {
> >>>>>>>
> >>>>>>> #define __EXEC_OBJECT_HAS_PIN BIT(31)
> >>>>>>> #define __EXEC_OBJECT_HAS_FENCE BIT(30)
> >>>>>>> -#define __EXEC_OBJECT_HAS_PAGES BIT(29)
> >>>>>>> -#define __EXEC_OBJECT_NEEDS_MAP BIT(28)
> >>>>>>> -#define __EXEC_OBJECT_NEEDS_BIAS BIT(27)
> >>>>>>> -#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 27) /* all of the above */
> >>>>>>> +#define __EXEC_OBJECT_NEEDS_MAP BIT(29)
> >>>>>>> +#define __EXEC_OBJECT_NEEDS_BIAS BIT(28)
> >>>>>>> +#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 28) /* all of the above */
> >>>>>>>
> >>>>>>> #define __EXEC_HAS_RELOC BIT(31)
> >>>>>>> #define __EXEC_INTERNAL_FLAGS (~0u << 31)
> >>>>>>> @@ -241,6 +240,8 @@ struct i915_execbuffer {
> >>>>>>> struct intel_context *context; /* logical state for the request */
> >>>>>>> struct i915_gem_context *gem_context; /** caller's context */
> >>>>>>>
> >>>>>>> + struct dma_fence *mm_fence;
> >>>>>>> +
> >>>>>>> struct i915_request *request; /** our request to build */
> >>>>>>> struct eb_vma *batch; /** identity of the batch obj/vma */
> >>>>>>> struct i915_vma *trampoline; /** trampoline used for chaining */
> >>>>>>> @@ -331,12 +332,7 @@ static inline void eb_unreserve_vma(struct eb_vma *ev)
> >>>>>>> if (ev->flags & __EXEC_OBJECT_HAS_PIN)
> >>>>>>> __i915_vma_unpin(vma);
> >>>>>>>
> >>>>>>> - if (ev->flags & __EXEC_OBJECT_HAS_PAGES)
> >>>>>>> - i915_gem_object_unpin_pages(vma->obj);
> >>>>>>> -
> >>>>>>> - ev->flags &= ~(__EXEC_OBJECT_HAS_PIN |
> >>>>>>> - __EXEC_OBJECT_HAS_FENCE |
> >>>>>>> - __EXEC_OBJECT_HAS_PAGES);
> >>>>>>> + ev->flags &= ~(__EXEC_OBJECT_HAS_PIN | __EXEC_OBJECT_HAS_FENCE);
> >>>>>>> }
> >>>>>>>
> >>>>>>> static void eb_vma_array_destroy(struct kref *kref)
> >>>>>>> @@ -667,6 +663,55 @@ eb_add_vma(struct i915_execbuffer *eb,
> >>>>>>> list_add_tail(&ev->lock_link, &eb->lock);
> >>>>>>> }
> >>>>>>>
> >>>>>>> +static int eb_vma_get_pages(struct i915_execbuffer *eb,
> >>>>>>> + struct eb_vma *ev,
> >>>>>>> + u64 idx)
> >>>>>>> +{
> >>>>>>> + struct i915_vma *vma = ev->vma;
> >>>>>>> + int err;
> >>>>>>> +
> >>>>>>> + /* XXX also preallocate PD for vma */
> >>>>>>> +
> >>>>>>> + err = ____i915_gem_object_get_pages_async(vma->obj);
> >>>>>>> + if (err)
> >>>>>>> + return err;
> >>>>>>> +
> >>>>>>> + return i915_active_ref(&vma->obj->mm.active, idx, eb->mm_fence);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +static int eb_reserve_mm(struct i915_execbuffer *eb)
> >>>>>>> +{
> >>>>>>> + const u64 idx = eb->context->timeline->fence_context;
> >>>>>>> + struct ww_acquire_ctx acquire;
> >>>>>>> + struct eb_vma *ev;
> >>>>>>> + int err;
> >>>>>>> +
> >>>>>>> + eb->mm_fence = __dma_fence_create_proxy(0, 0);
> >>>>>>> + if (!eb->mm_fence)
> >>>>>>> + return -ENOMEM;
> >>>>>> Question: eb is local to this thread, right, so eb->mm_fence is not
> >>>>>> considered "published" yet?
> >>>>>>
> >>>>>>> +
> >>>>>>> + ww_acquire_init(&acquire, &reservation_ww_class);
> >>>>>>> +
> >>>>>>> + err = eb_lock_vma(eb, &acquire);
> >>>>>>> + if (err)
> >>>>>>> + goto out;
> >>>>>>> +
> >>>>>>> + ww_acquire_done(&acquire);
> >>>>>>> +
> >>>>>>> + list_for_each_entry(ev, &eb->lock, lock_link) {
> >>>>>>> + struct i915_vma *vma = ev->vma;
> >>>>>>> +
> >>>>>>> + if (err == 0)
> >>>>>>> + err = eb_vma_get_pages(eb, ev, idx);
> >>>>>> I figure this is where you publish the proxy fence? If so, the fence
> >>>>>> signaling critical path starts with this loop,
> >>>>> Hmm, actually at this moment, the fence is still very much internal
> >>>>> being only used as a reference token,
> >>>> I think as long as another thread, running in this driver or another gpu
> >>>> driver can theoretically reference the fence pointer from the
> >>>> reservation object and wait for the fence it's considered published.
> >>> It's not in the reservation object.
> >>>
> >>>> Also the ww_mutexes in this context are really all about grabbing a
> >>>> random set of resources and associate them with a point in a timeline,
> >>>> as the ww_mutexes are released, the fence pointer(s) need to point to
> >>>> published fence(s).
> >>> That's not the purpose of these fences, though. They exist to provide
> >>> reference counting on the backing store, along side the migration fence.
> >>> It's extra detail tacked on the equivalent of bo->moving.
> >>>
> >>> That is not to say that one could build up an async migration chain which
> >>> form a graph back to these, that chain could only be formed once the
> >>> operation itself has been published in the dma_resv though.
> >> Hmm. So let's say another thread grabs one of the just released
> >> ww_mutexes and wants to schedule a blit from one of the buffers in the
> >> current operation with high priority. How would that thread know how to
> >> order that blit operation w r t the current operation?
> > Why would it order?
> So let's say it's an eviction blit, needing to incorporate the data from
> the current operation. Or, for that matter a ttm-style cpu copy eviction:
>
> ww_mutex_lock
> wait_for_idle
> copy
> ww_mutex_unlock
We have a scheduler. Eviction does not block. Submission never blocks.
lock
swap allocation blocks
unlock
copy.
-Chris
More information about the Intel-gfx
mailing list