[Intel-gfx] [PATCH 28/66] drm/i915/gem: Replace i915_gem_object.mm.mutex with reservation_ww_class

Tue Jul 28 11:17:15 UTC 2020

On 7/16/20 5:53 PM, Tvrtko Ursulin wrote:
>
> On 15/07/2020 16:43, Maarten Lankhorst wrote:
>> Op 15-07-2020 om 13:51 schreef Chris Wilson:
>>> Our goal is to pull all memory reservations (next iteration
>>> obj->ops->get_pages()) under a ww_mutex, and to align those 
>>> reservations
>>> with other drivers, i.e. control all such allocations with the
>>> reservation_ww_class. Currently, this is under the purview of the
>>> obj->mm.mutex, and while obj->mm remains an embedded struct we can
>>> "simply" switch to using the reservation_ww_class obj->base.resv->lock
>>>
>>> The major consequence is the impact on the shrinker paths as the
>>> reservation_ww_class is used to wrap allocations, and a ww_mutex does
>>> not support subclassing so we cannot do our usual trick of knowing that
>>> we never recurse inside the shrinker and instead have to finish the
>>> reclaim with a trylock. This may result in us failing to release the
>>> pages after having released the vma. This will have to do until a 
>>> better
>>> idea comes along.
>>>
>>> However, this step only converts the mutex over and continues to treat
>>> everything as a single allocation and pinning the pages. With the
>>> ww_mutex in place we can remove the temporary pinning, as we can then
>>> reserve all storage en masse.
>>>
>>> One last thing to do: kill the implict page pinning for active vma.
>>> This will require us to invalidate the vma->pages when the backing 
>>> store
>>> is removed (and we expect that while the vma is active, we mark the
>>> backing store as active so that it cannot be removed while the HW is
>>> busy.)
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>
> [snip]
>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
>>> index dc8f052a0ffe..4e928103a38f 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
>>> @@ -47,10 +47,7 @@ static bool unsafe_drop_pages(struct 
>>> drm_i915_gem_object *obj,
>>>       if (!(shrink & I915_SHRINK_BOUND))
>>>           flags = I915_GEM_OBJECT_UNBIND_TEST;
>>>   -    if (i915_gem_object_unbind(obj, flags) == 0)
>>> -        __i915_gem_object_put_pages(obj);
>>> -
>>> -    return !i915_gem_object_has_pages(obj);
>>> +    return i915_gem_object_unbind(obj, flags) == 0;
>>>   }
>>>     static void try_to_writeback(struct drm_i915_gem_object *obj,
>>> @@ -199,14 +196,14 @@ i915_gem_shrink(struct drm_i915_private *i915,
>>> spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
>>>   -            if (unsafe_drop_pages(obj, shrink)) {
>>> -                /* May arrive from get_pages on another bo */
>>> -                mutex_lock(&obj->mm.lock);
>>> +            if (unsafe_drop_pages(obj, shrink) &&
>>> +                i915_gem_object_trylock(obj)) {
>
>> Why trylock? Because of the nesting? In that case, still use ww ctx 
>> if provided please
>
> By "if provided" you mean for code paths where we are calling the 
> shrinker ourselves, as opposed to reclaim, like shmem_get_pages?
>
> That indeed sounds like the right thing to do, since all the get_pages 
> from execbuf are in the reservation phase, collecting a list of GEM 
> objects to lock, the ones to shrink sound like should be on that list.
>
>>> + __i915_gem_object_put_pages(obj);
>>>                   if (!i915_gem_object_has_pages(obj)) {
>>>                       try_to_writeback(obj, shrink);
>>>                       count += obj->base.size >> PAGE_SHIFT;
>>>                   }
>>> -                mutex_unlock(&obj->mm.lock);
>>> +                i915_gem_object_unlock(obj);
>>>               }
>>>                 scanned += obj->base.size >> PAGE_SHIFT;
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_tiling.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_tiling.c
>>> index ff72ee2fd9cd..ac12e1c20e66 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_tiling.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_tiling.c
>>> @@ -265,7 +265,6 @@ i915_gem_object_set_tiling(struct 
>>> drm_i915_gem_object *obj,
>>>        * pages to prevent them being swapped out and causing corruption
>>>        * due to the change in swizzling.
>>>        */
>>> -    mutex_lock(&obj->mm.lock);
>>>       if (i915_gem_object_has_pages(obj) &&
>>>           obj->mm.madv == I915_MADV_WILLNEED &&
>>>           i915->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
>>> @@ -280,7 +279,6 @@ i915_gem_object_set_tiling(struct 
>>> drm_i915_gem_object *obj,
>>>               obj->mm.quirked = true;
>>>           }
>>>       }
>>> -    mutex_unlock(&obj->mm.lock);
>>>         spin_lock(&obj->vma.lock);
>>>       for_each_ggtt_vma(vma, obj) {
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>>> index e946032b13e4..80907c00c6fd 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>>> @@ -129,8 +129,15 @@ userptr_mn_invalidate_range_start(struct 
>>> mmu_notifier *_mn,
>>>           ret = i915_gem_object_unbind(obj,
>>>                            I915_GEM_OBJECT_UNBIND_ACTIVE |
>>>                            I915_GEM_OBJECT_UNBIND_BARRIER);
>>> -        if (ret == 0)
>>> -            ret = __i915_gem_object_put_pages(obj);
>>> +        if (ret == 0) {
>>> +            /* ww_mutex and mmu_notifier is fs_reclaim tainted */
>>> +            if (i915_gem_object_trylock(obj)) {
>>> +                ret = __i915_gem_object_put_pages(obj);
>>> +                i915_gem_object_unlock(obj);
>>> +            } else {
>>> +                ret = -EAGAIN;
>>> +            }
>>> +        }
>>
>> I'm not sure upstream will agree with this kind of API:
>>
>> 1. It will deadlock when RT tasks are used.
>
> It will or it can? Which part? It will break out of the loop if 
> trylock fails.
>
>>
>> 2. You start throwing -EAGAIN because you don't have the correct 
>> ordering of locking, this needs fixing first.
>
> Is it about correct ordering of locks or something else? If memory 
> allocation is allowed under dma_resv.lock, then the opposite order 
> cannot be taken in any case.
>
> I've had a brief look at the amdgpu solution and maybe I misunderstood 
> something, but it looks like a BKL approach with the device level 
> notifier_lock. Their userptr notifier blocks on that one, not on 
> dma_resv lock, but that also means their command submission 
> (amdgpu_cs_submit) blocks on the same lock while obtaining backing store.

If I read Christian right, it blocks on that lock only just before 
command submission to validate that sequence number. If there is a 
mismatch, it needs to rerun CS. I'm not sure how common userptr buffers 
are, but if a device-wide mutex hurts too much, There are perhaps more 
fine-grained solutions. (Like an rw semaphore, and unlock before the 
fence wait in the notifier: CS which are unaffected shouldn't need to 
wait...).

>
> So it looks like a big hammer approach not directly related to the 
> story of dma_resv locking. Maybe we could do the same big hammer 
> approach, although I am not sure how it is deadlock free.
>
> What happens for instance if someone submits an userptr batch which 
> gets unmapped while amdgpu_cs_submit is holding the notifier_lock?

My understanding is the unmapping operation blocks on the notifier_lock 
in the mmu notifier?

/Thomas

>
> If you understand amdgpu better please share some insights. I 
> certainly only looked at it briefly today so may be wrong.
>
> Regards,
>
> Tvrtko
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx