[Intel-gfx] [PATCH 28/66] drm/i915/gem: Replace i915_gem_object.mm.mutex with reservation_ww_class

Thomas Hellström (Intel) thomas_os at shipmail.org
Wed Jul 29 13:44:41 UTC 2020


On 7/29/20 2:17 PM, Tvrtko Ursulin wrote:
>
> On 28/07/2020 12:17, Thomas Hellström (Intel) wrote:
>> On 7/16/20 5:53 PM, Tvrtko Ursulin wrote:
>>> On 15/07/2020 16:43, Maarten Lankhorst wrote:
>>>> Op 15-07-2020 om 13:51 schreef Chris Wilson:
>>>>> Our goal is to pull all memory reservations (next iteration
>>>>> obj->ops->get_pages()) under a ww_mutex, and to align those 
>>>>> reservations
>>>>> with other drivers, i.e. control all such allocations with the
>>>>> reservation_ww_class. Currently, this is under the purview of the
>>>>> obj->mm.mutex, and while obj->mm remains an embedded struct we can
>>>>> "simply" switch to using the reservation_ww_class 
>>>>> obj->base.resv->lock
>>>>>
>>>>> The major consequence is the impact on the shrinker paths as the
>>>>> reservation_ww_class is used to wrap allocations, and a ww_mutex does
>>>>> not support subclassing so we cannot do our usual trick of knowing 
>>>>> that
>>>>> we never recurse inside the shrinker and instead have to finish the
>>>>> reclaim with a trylock. This may result in us failing to release the
>>>>> pages after having released the vma. This will have to do until a 
>>>>> better
>>>>> idea comes along.
>>>>>
>>>>> However, this step only converts the mutex over and continues to 
>>>>> treat
>>>>> everything as a single allocation and pinning the pages. With the
>>>>> ww_mutex in place we can remove the temporary pinning, as we can then
>>>>> reserve all storage en masse.
>>>>>
>>>>> One last thing to do: kill the implict page pinning for active vma.
>>>>> This will require us to invalidate the vma->pages when the backing 
>>>>> store
>>>>> is removed (and we expect that while the vma is active, we mark the
>>>>> backing store as active so that it cannot be removed while the HW is
>>>>> busy.)
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>
>>> [snip]
>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c 
>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
>>>>> index dc8f052a0ffe..4e928103a38f 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
>>>>> @@ -47,10 +47,7 @@ static bool unsafe_drop_pages(struct 
>>>>> drm_i915_gem_object *obj,
>>>>>       if (!(shrink & I915_SHRINK_BOUND))
>>>>>           flags = I915_GEM_OBJECT_UNBIND_TEST;
>>>>>   -    if (i915_gem_object_unbind(obj, flags) == 0)
>>>>> -        __i915_gem_object_put_pages(obj);
>>>>> -
>>>>> -    return !i915_gem_object_has_pages(obj);
>>>>> +    return i915_gem_object_unbind(obj, flags) == 0;
>>>>>   }
>>>>>     static void try_to_writeback(struct drm_i915_gem_object *obj,
>>>>> @@ -199,14 +196,14 @@ i915_gem_shrink(struct drm_i915_private *i915,
>>>>> spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
>>>>>   -            if (unsafe_drop_pages(obj, shrink)) {
>>>>> -                /* May arrive from get_pages on another bo */
>>>>> -                mutex_lock(&obj->mm.lock);
>>>>> +            if (unsafe_drop_pages(obj, shrink) &&
>>>>> +                i915_gem_object_trylock(obj)) {
>>>
>>>> Why trylock? Because of the nesting? In that case, still use ww ctx 
>>>> if provided please
>>>
>>> By "if provided" you mean for code paths where we are calling the 
>>> shrinker ourselves, as opposed to reclaim, like shmem_get_pages?
>>>
>>> That indeed sounds like the right thing to do, since all the 
>>> get_pages from execbuf are in the reservation phase, collecting a 
>>> list of GEM objects to lock, the ones to shrink sound like should be 
>>> on that list.
>>>
>>>>> + __i915_gem_object_put_pages(obj);
>>>>>                   if (!i915_gem_object_has_pages(obj)) {
>>>>>                       try_to_writeback(obj, shrink);
>>>>>                       count += obj->base.size >> PAGE_SHIFT;
>>>>>                   }
>>>>> -                mutex_unlock(&obj->mm.lock);
>>>>> +                i915_gem_object_unlock(obj);
>>>>>               }
>>>>>                 scanned += obj->base.size >> PAGE_SHIFT;
>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_tiling.c 
>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_tiling.c
>>>>> index ff72ee2fd9cd..ac12e1c20e66 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_tiling.c
>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_tiling.c
>>>>> @@ -265,7 +265,6 @@ i915_gem_object_set_tiling(struct 
>>>>> drm_i915_gem_object *obj,
>>>>>        * pages to prevent them being swapped out and causing 
>>>>> corruption
>>>>>        * due to the change in swizzling.
>>>>>        */
>>>>> -    mutex_lock(&obj->mm.lock);
>>>>>       if (i915_gem_object_has_pages(obj) &&
>>>>>           obj->mm.madv == I915_MADV_WILLNEED &&
>>>>>           i915->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
>>>>> @@ -280,7 +279,6 @@ i915_gem_object_set_tiling(struct 
>>>>> drm_i915_gem_object *obj,
>>>>>               obj->mm.quirked = true;
>>>>>           }
>>>>>       }
>>>>> -    mutex_unlock(&obj->mm.lock);
>>>>>         spin_lock(&obj->vma.lock);
>>>>>       for_each_ggtt_vma(vma, obj) {
>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>>>>> index e946032b13e4..80907c00c6fd 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
>>>>> @@ -129,8 +129,15 @@ userptr_mn_invalidate_range_start(struct 
>>>>> mmu_notifier *_mn,
>>>>>           ret = i915_gem_object_unbind(obj,
>>>>>                            I915_GEM_OBJECT_UNBIND_ACTIVE |
>>>>> I915_GEM_OBJECT_UNBIND_BARRIER);
>>>>> -        if (ret == 0)
>>>>> -            ret = __i915_gem_object_put_pages(obj);
>>>>> +        if (ret == 0) {
>>>>> +            /* ww_mutex and mmu_notifier is fs_reclaim tainted */
>>>>> +            if (i915_gem_object_trylock(obj)) {
>>>>> +                ret = __i915_gem_object_put_pages(obj);
>>>>> +                i915_gem_object_unlock(obj);
>>>>> +            } else {
>>>>> +                ret = -EAGAIN;
>>>>> +            }
>>>>> +        }
>>>>
>>>> I'm not sure upstream will agree with this kind of API:
>>>>
>>>> 1. It will deadlock when RT tasks are used.
>>>
>>> It will or it can? Which part? It will break out of the loop if 
>>> trylock fails.
>>>
>>>>
>>>> 2. You start throwing -EAGAIN because you don't have the correct 
>>>> ordering of locking, this needs fixing first.
>>>
>>> Is it about correct ordering of locks or something else? If memory 
>>> allocation is allowed under dma_resv.lock, then the opposite order 
>>> cannot be taken in any case.
>>>
>>> I've had a brief look at the amdgpu solution and maybe I 
>>> misunderstood something, but it looks like a BKL approach with the 
>>> device level notifier_lock. Their userptr notifier blocks on that 
>>> one, not on dma_resv lock, but that also means their command 
>>> submission (amdgpu_cs_submit) blocks on the same lock while 
>>> obtaining backing store.
>>
>> If I read Christian right, it blocks on that lock only just before 
>> command submission to validate that sequence number. If there is a 
>> mismatch, it needs to rerun CS. I'm not sure how common userptr 
>> buffers are, but if a device-wide mutex hurts too much, There are 
>> perhaps more fine-grained solutions. (Like an rw semaphore, and 
>> unlock before the fence wait in the notifier: CS which are unaffected 
>> shouldn't need to wait...).
>
> Right, I wasn't familiar with the mmu notifier seqno business. Hm i915 
> and amdgpu seem to use different mmu notifier hooks. Could we use the 
> same and does it mean "locking order" is irrelevant to the sub-story 
> of userptr?
>
>>
>>>
>>> So it looks like a big hammer approach not directly related to the 
>>> story of dma_resv locking. Maybe we could do the same big hammer 
>>> approach, although I am not sure how it is deadlock free.
>>>
>>> What happens for instance if someone submits an userptr batch which 
>>> gets unmapped while amdgpu_cs_submit is holding the notifier_lock?
>>
>> My understanding is the unmapping operation blocks on the 
>> notifier_lock in the mmu notifier?
>
> Yes, the key will be understanding the difference between
> invalidate_range_start and invalidate+seqno and whether we can use the 
> same from i915 to avoid the trylock.
>
> But given how ww mutex is tainted fs reclaim does that mean we would 
> also need to ensure no memory allocations under ww mutex?

So with the fs_reclaim, we can do memory allocations under ww_mutex, but 
need trylock in shrinkers and mmu_notifiers. However get_user_pages() 
must not run under ww_mutex due to mmap_sem inversion.

But with the AMD solution I figure we don't really need the ww_mutex in 
the mmu_notifier at all? That would possibly mean holding on to and 
exposing to GPU its pages that aren't used anymore, but a shrinker would 
notice the sequence number mismatch and consider the pages purgeable, 
and also the next cs the stale pages would be released.

Still if we can't live with that either, we can free the pages and kill 
GPU bindings with an async work from the mmu_notifier.

Danvet suggested at some point breaking out the synchronization part of 
the AMD solution and make drm helpers out of it. I think that makes sense.

/Thomas

>
> Regards,
>
> Tvrtko





>
>
>>
>> /Thomas
>>
>>
>>>
>>> If you understand amdgpu better please share some insights. I 
>>> certainly only looked at it briefly today so may be wrong.
>>>
>>> Regards,
>>>
>>> Tvrtko
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx


More information about the Intel-gfx mailing list