[Intel-gfx] [PATCH 15/21] drm/i915: Coordinate i915_active with its own mutex

Fri Sep 20 16:32:57 UTC 2019

Quoting Tvrtko Ursulin (2019-09-20 17:14:43)
> 
> On 02/09/2019 05:02, Chris Wilson wrote:
> > diff --git a/drivers/gpu/drm/i915/gt/intel_timeline_types.h b/drivers/gpu/drm/i915/gt/intel_timeline_types.h
> > index 2b1baf2fcc8e..6d7ac129ce8a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_timeline_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_timeline_types.h
> > @@ -63,7 +63,7 @@ struct intel_timeline {
> >        * the request using i915_active_request_get_request_rcu(), or hold the
> 
> Looks like a stale comment.
> 
> >        * struct_mutex.
> >        */
> > -     struct i915_active_request last_request;
> > +     struct i915_active_fence last_request;
> 
> Worth renaming to last_fence now that request naming is otherwise gone 
> from i915_active?

Hmm, although i915_active is taking on more generic dma-fences, this is
still assumed to be a i915_request in a couple of places.

There's good arguments for either last_request or last_fence. If I throw
a GEM_BUG_ON(!is_request()) around, that probably swings the debate in
favour of last_request. At least tries to capture that we do assume in
some places this is i915_request.

> >   void i915_active_set_exclusive(struct i915_active *ref, struct dma_fence *f)
> >   {
> >       GEM_BUG_ON(i915_active_is_idle(ref));
> >   
> > -     dma_fence_get(f);
> > -
> > -     rcu_read_lock();
> > -     if (rcu_access_pointer(ref->excl)) {
> > -             struct dma_fence *old;
> > -
> > -             old = dma_fence_get_rcu_safe(&ref->excl);
> > -             if (old) {
> > -                     if (dma_fence_remove_callback(old, &ref->excl_cb))
> > -                             atomic_dec(&ref->count);
> > -                     dma_fence_put(old);
> > -             }
> > -     }
> > -     rcu_read_unlock();
> > -
> > -     atomic_inc(&ref->count);
> > -     rcu_assign_pointer(ref->excl, f);
> > +     mutex_acquire(&ref->mutex.dep_map, 0, 0, _THIS_IP_);
> 
> This just lockdep annotation? Probably deserves a comment.

Yup.

> 
> > +     if (!__i915_active_fence_set(&ref->excl, f))
> > +             atomic_inc(&ref->count);
> 
> Refcount management is not better done from inside __i915_active_fence_set?

No, The active counting is on i915_active; i915_active_fence is just a
component.

E.g. other users want to know the fence that used to be in the
i915_active_fence:

int i915_active_fence_set(struct i915_active_fence *active,
                          struct i915_request *rq)
{
        struct dma_fence *fence;
        int err;

#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
        lockdep_assert_held(active->lock);
#endif

        /* Must maintain ordering wrt previous active requests */
        rcu_read_lock();
        fence = __i915_active_fence_set(active, &rq->fence);
        if (fence)
                fence = dma_fence_get_rcu(fence);
        rcu_read_unlock();

        if (fence) {
                err = i915_request_await_dma_fence(rq, fence);
                dma_fence_put(fence);
                if (err)
                        return err;
        }

        return 0;
}

and similarly in __i915_request_add_to_timeline()

> > +struct dma_fence *
> > +__i915_active_fence_set(struct i915_active_fence *active,
> > +                     struct dma_fence *fence)
> > +{
> > +     struct dma_fence *prev;
> > +     unsigned long flags;
> > +
> > +     /* NB: updates must be serialised by an outer timeline mutex */
> > +     spin_lock_irqsave(fence->lock, flags);
> > +     GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags));
> > +
> > +     prev = rcu_dereference_protected(active->fence, active_is_held(active));
> > +     if (prev) {
> > +             spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
> > +             __list_del_entry(&active->cb.node);
> > +             spin_unlock(prev->lock); /* serialise with prev->cb_list */
> > +             prev = rcu_access_pointer(active->fence);
> 
> Why it is important to re-read active->fence and does it then need a 
> comment?

Because we have to serialise with the prev->cb_list. ;)

The write to active->fence is made on signaling, and that is performed
under the prev->lock. So we have to take the prev->lock ourselves to
flush any concurrent callbacks before we know whether they ran (having
taken the lock, we remove the cb from the list and so now that if it
has not run, it can not run).

> How does it tie with i915_active->count refcounting which is done on the 
> basis of return value from this function?

Same as above, we have to flush the signal callback to determine whether
or not we need to increment the active count.

> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 3eed2efa8d13..1aadab1cdd24 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -900,28 +900,38 @@ wait_for_timelines(struct drm_i915_private *i915,
> >   
> >       spin_lock_irqsave(&timelines->lock, flags);
> >       list_for_each_entry(tl, &timelines->active_list, link) {
> > -             struct i915_request *rq;
> > +             struct dma_fence *fence;
> >   
> > -             rq = i915_active_request_get_unlocked(&tl->last_request);
> > -             if (!rq)
> > +             fence = i915_active_fence_get(&tl->last_request);
> > +             if (!fence)
> >                       continue;
> >   
> >               spin_unlock_irqrestore(&timelines->lock, flags);
> >   
> > -             /*
> > -              * "Race-to-idle".
> > -              *
> > -              * Switching to the kernel context is often used a synchronous
> > -              * step prior to idling, e.g. in suspend for flushing all
> > -              * current operations to memory before sleeping. These we
> > -              * want to complete as quickly as possible to avoid prolonged
> > -              * stalls, so allow the gpu to boost to maximum clocks.
> > -              */
> > -             if (wait & I915_WAIT_FOR_IDLE_BOOST)
> > -                     gen6_rps_boost(rq);
> > +             if (!dma_fence_is_i915(fence)) {
> > +                     timeout = dma_fence_wait_timeout(fence,
> > +                                                      flags & I915_WAIT_INTERRUPTIBLE,
> > +                                                      timeout);
> 
> We did not have support for non-i915 fences before this patch right? 
> Would it be better to split this out in this case?

Part of raison d'etre for the patch was to allow for i915_active to have
dma-fences, and to become less i915-specific.

It's trivial enough that in a couple of patches time you will forget
that we even thought about i915_requests here ;)
-Chris