[Intel-gfx] [PATCH 7/7] drm/i915/gem: Acquire all vma/objects under reservation_ww_class

Tue Jun 23 16:36:18 UTC 2020

Quoting Thomas Hellström (Intel) (2020-06-23 12:22:11)
> Hi, Chris,
> 
> On 6/22/20 11:59 AM, Chris Wilson wrote:
> > In order to actually handle eviction and what not, we need to process
> > all the objects together under a common lock, reservation_ww_class. As
> > such, do a memory reservation pass after looking up the object/vma,
> > which then feeds into the rest of execbuf [relocation, cmdparsing,
> > flushing and ofc execution].
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> >   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 91 ++++++++++++++-----
> >   1 file changed, 70 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > index 46fcbdf8161c..8db2e013465f 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > @@ -53,10 +53,9 @@ struct eb_vma_array {
> >   
> >   #define __EXEC_OBJECT_HAS_PIN               BIT(31)
> >   #define __EXEC_OBJECT_HAS_FENCE             BIT(30)
> > -#define __EXEC_OBJECT_HAS_PAGES              BIT(29)
> > -#define __EXEC_OBJECT_NEEDS_MAP              BIT(28)
> > -#define __EXEC_OBJECT_NEEDS_BIAS     BIT(27)
> > -#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 27) /* all of the above */
> > +#define __EXEC_OBJECT_NEEDS_MAP              BIT(29)
> > +#define __EXEC_OBJECT_NEEDS_BIAS     BIT(28)
> > +#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 28) /* all of the above */
> >   
> >   #define __EXEC_HAS_RELOC    BIT(31)
> >   #define __EXEC_INTERNAL_FLAGS       (~0u << 31)
> > @@ -241,6 +240,8 @@ struct i915_execbuffer {
> >       struct intel_context *context; /* logical state for the request */
> >       struct i915_gem_context *gem_context; /** caller's context */
> >   
> > +     struct dma_fence *mm_fence;
> > +
> >       struct i915_request *request; /** our request to build */
> >       struct eb_vma *batch; /** identity of the batch obj/vma */
> >       struct i915_vma *trampoline; /** trampoline used for chaining */
> > @@ -331,12 +332,7 @@ static inline void eb_unreserve_vma(struct eb_vma *ev)
> >       if (ev->flags & __EXEC_OBJECT_HAS_PIN)
> >               __i915_vma_unpin(vma);
> >   
> > -     if (ev->flags & __EXEC_OBJECT_HAS_PAGES)
> > -             i915_gem_object_unpin_pages(vma->obj);
> > -
> > -     ev->flags &= ~(__EXEC_OBJECT_HAS_PIN |
> > -                    __EXEC_OBJECT_HAS_FENCE |
> > -                    __EXEC_OBJECT_HAS_PAGES);
> > +     ev->flags &= ~(__EXEC_OBJECT_HAS_PIN | __EXEC_OBJECT_HAS_FENCE);
> >   }
> >   
> >   static void eb_vma_array_destroy(struct kref *kref)
> > @@ -667,6 +663,55 @@ eb_add_vma(struct i915_execbuffer *eb,
> >       list_add_tail(&ev->lock_link, &eb->lock);
> >   }
> >   
> > +static int eb_vma_get_pages(struct i915_execbuffer *eb,
> > +                         struct eb_vma *ev,
> > +                         u64 idx)
> > +{
> > +     struct i915_vma *vma = ev->vma;
> > +     int err;
> > +
> > +     /* XXX also preallocate PD for vma */
> > +
> > +     err = ____i915_gem_object_get_pages_async(vma->obj);
> > +     if (err)
> > +             return err;
> > +
> > +     return i915_active_ref(&vma->obj->mm.active, idx, eb->mm_fence);
> > +}
> > +
> > +static int eb_reserve_mm(struct i915_execbuffer *eb)
> > +{
> > +     const u64 idx = eb->context->timeline->fence_context;
> > +     struct ww_acquire_ctx acquire;
> > +     struct eb_vma *ev;
> > +     int err;
> > +
> > +     eb->mm_fence = __dma_fence_create_proxy(0, 0);
> > +     if (!eb->mm_fence)
> > +             return -ENOMEM;
> 
> Question: eb is local to this thread, right, so eb->mm_fence is not 
> considered "published" yet?
> 
> > +
> > +     ww_acquire_init(&acquire, &reservation_ww_class);
> > +
> > +     err = eb_lock_vma(eb, &acquire);
> > +     if (err)
> > +             goto out;
> > +
> > +     ww_acquire_done(&acquire);
> > +
> > +     list_for_each_entry(ev, &eb->lock, lock_link) {
> > +             struct i915_vma *vma = ev->vma;
> > +
> > +             if (err == 0)
> > +                     err = eb_vma_get_pages(eb, ev, idx);
> 
> I figure this is where you publish the proxy fence? If so, the fence 
> signaling critical path starts with this loop,

Hmm, actually at this moment, the fence is still very much internal
being only used as a reference token, and the async fence for the pages
is still only in the internal migration slot [along side the reference
tokens].

Those fences will not be attached to the dma_resv until the chains are
completed in move-to-gpu.

That might be enough of a difference to consider.
-Chris