[Intel-gfx] [PATCH 07/27] drm/i915: Coordinate i915_active with its own mutex

Fri Sep 27 12:08:51 UTC 2019

On 27/09/2019 12:25, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-09-27 12:10:29)
>>
>> On 25/09/2019 11:01, Chris Wilson wrote:
>>>    void i915_active_set_exclusive(struct i915_active *ref, struct dma_fence *f)
>>>    {
>>>        /* We expect the caller to manage the exclusive timeline ordering */
>>>        GEM_BUG_ON(i915_active_is_idle(ref));
>>>    
>>> -     dma_fence_get(f);
>>> -
>>> -     rcu_read_lock();
>>> -     if (rcu_access_pointer(ref->excl)) {
>>> -             struct dma_fence *old;
>>> -
>>> -             old = dma_fence_get_rcu_safe(&ref->excl);
>>> -             if (old) {
>>> -                     if (dma_fence_remove_callback(old, &ref->excl_cb))
>>> -                             atomic_dec(&ref->count);
>>> -                     dma_fence_put(old);
>>> -             }
>>> -     }
>>> -     rcu_read_unlock();
>>> -
>>> -     atomic_inc(&ref->count);
>>> -     rcu_assign_pointer(ref->excl, f);
>>> +     mutex_acquire(&ref->mutex.dep_map, 0, 0, _THIS_IP_);
>>> +     if (!__i915_active_fence_set(&ref->excl, f))
>>> +             atomic_inc(&ref->count);
>>> +     mutex_release(&ref->mutex.dep_map, 0, _THIS_IP_);
>>
>> Comment for this block please to explain what's going on.
> 
> This was meant to be clued in by the opening comment that the caller is
> expected to manage the exclusive timeline. But since the
> i915_active_fence debug API requires us to declare what mutex guards
> each timeline, we had to fake one.

I think the commentary about duality of which mutex guards things is 
only present in the commit message at the moment and it would be really 
good to have that somewhere in the code as well. Because after a few 
months of refactoring, going from git blame to commits can stop working 
so well, and then in code commentary becomes very valuable.

>>> +struct dma_fence *
>>> +__i915_active_fence_set(struct i915_active_fence *active,
>>> +                     struct dma_fence *fence)
>>
>> Can you add a short comment for this function explaining the racyness
>> and so why it returns prev and what should the callers do with it?
> 
> Before using this function, we had the callers declare what mutex guards
> this timeline and we check here that is held.

No, I mean because it has to reload prev and return it to the caller, 
which implies that is to handle some designed-in racy-ness which I think 
should be mentioned.

> 
>>> +{
>>> +     struct dma_fence *prev;
>>> +     unsigned long flags;
>>> +
>>> +     /* NB: updates must be serialised by an outer timeline mutex */
>>
>> Can't check here?
> 
> We do, see active_is_held()

Ack!

>>> +     spin_lock_irqsave(fence->lock, flags);
>>> +     GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags));
>>> +
>>> +     prev = rcu_dereference_protected(active->fence, active_is_held(active));
>>> +     if (prev) {
>>> +             spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
>>> +             __list_del_entry(&active->cb.node);
>>> +             spin_unlock(prev->lock); /* serialise with prev->cb_list */
>>> +             prev = rcu_access_pointer(active->fence);
>>>        }
>>> +
>>> +     rcu_assign_pointer(active->fence, fence);
>>> +     list_add_tail(&active->cb.node, &fence->cb_list);
>>> +
>>> +     spin_unlock_irqrestore(fence->lock, flags);
>>> +
>>> +     return prev;
>>>    }
>>>    
>>> -int i915_active_request_set(struct i915_active_request *active,
>>> -                         struct i915_request *rq)
>>> +int i915_active_fence_set(struct i915_active_fence *active,
>>> +                       struct i915_request *rq)
>>>    {
>>> +     struct dma_fence *fence;
>>>        int err;
>>>    
>>>    #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
>>
>> Here not in diff we actually have:
>>
>> #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
>>          lockdep_assert_held(active->lock);
>>
>> So why only under GEM debug and how does it related to the NB comment in
>> __i915_active_fence_set?
> 
> We only expand the struct when we want to for debugging.

Missed that detail, ok.

Regards,

Tvrtko