[Intel-gfx] [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

Wed Aug 21 15:34:43 UTC 2019

On 8/21/19 5:22 PM, Daniel Vetter wrote:
> On Wed, Aug 21, 2019 at 5:19 PM Thomas Hellström (VMware)
> <thomas_os at shipmail.org> wrote:
>> On 8/21/19 5:14 PM, Daniel Vetter wrote:
>>> On Wed, Aug 21, 2019 at 5:03 PM Thomas Hellström (VMware)
>>> <thomas_os at shipmail.org> wrote:
>>>> On 8/21/19 4:47 PM, Daniel Vetter wrote:
>>>>> On Wed, Aug 21, 2019 at 4:27 PM Thomas Hellström (VMware)
>>>>> <thomas_os at shipmail.org> wrote:
>>>>>> On 8/21/19 4:09 PM, Daniel Vetter wrote:
>>>>>>> On Wed, Aug 21, 2019 at 2:47 PM Thomas Hellström (VMware)
>>>>>>> <thomas_os at shipmail.org> wrote:
>>>>>>>> On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote:
>>>>>>>>> On 8/20/19 4:53 PM, Daniel Vetter wrote:
>>>>>>>>>> With nouveau fixed all ttm-using drives have the correct nesting of
>>>>>>>>>> mmap_sem vs dma_resv, and we can just lock the buffer.
>>>>>>>>>>
>>>>>>>>>> Assuming I didn't screw up anything with my audit of course.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
>>>>>>>>>> Cc: Christian Koenig <christian.koenig at amd.com>
>>>>>>>>>> Cc: Huang Rui <ray.huang at amd.com>
>>>>>>>>>> Cc: Gerd Hoffmann <kraxel at redhat.com>
>>>>>>>>>> Cc: "VMware Graphics" <linux-graphics-maintainer at vmware.com>
>>>>>>>>>> Cc: Thomas Hellstrom <thellstrom at vmware.com>
>>>>>>>>>> ---
>>>>>>>>>>       drivers/gpu/drm/ttm/ttm_bo.c    | 34 ---------------------------------
>>>>>>>>>>       drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +------------------------
>>>>>>>>>>       include/drm/ttm/ttm_bo_api.h    |  1 -
>>>>>>>>>>       3 files changed, 1 insertion(+), 60 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>>>> index 20ff56f27aa4..a952dd624b06 100644
>>>>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>>>> @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device
>>>>>>>>>> *bdev)
>>>>>>>>>>               ;
>>>>>>>>>>       }
>>>>>>>>>>       EXPORT_SYMBOL(ttm_bo_swapout_all);
>>>>>>>>>> -
>>>>>>>>>> -/**
>>>>>>>>>> - * ttm_bo_wait_unreserved - interruptible wait for a buffer object
>>>>>>>>>> to become
>>>>>>>>>> - * unreserved
>>>>>>>>>> - *
>>>>>>>>>> - * @bo: Pointer to buffer
>>>>>>>>>> - */
>>>>>>>>>> -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo)
>>>>>>>>>> -{
>>>>>>>>>> -    int ret;
>>>>>>>>>> -
>>>>>>>>>> -    /*
>>>>>>>>>> -     * In the absense of a wait_unlocked API,
>>>>>>>>>> -     * Use the bo::wu_mutex to avoid triggering livelocks due to
>>>>>>>>>> -     * concurrent use of this function. Note that this use of
>>>>>>>>>> -     * bo::wu_mutex can go away if we change locking order to
>>>>>>>>>> -     * mmap_sem -> bo::reserve.
>>>>>>>>>> -     */
>>>>>>>>>> -    ret = mutex_lock_interruptible(&bo->wu_mutex);
>>>>>>>>>> -    if (unlikely(ret != 0))
>>>>>>>>>> -        return -ERESTARTSYS;
>>>>>>>>>> -    if (!dma_resv_is_locked(bo->base.resv))
>>>>>>>>>> -        goto out_unlock;
>>>>>>>>>> -    ret = dma_resv_lock_interruptible(bo->base.resv, NULL);
>>>>>>>>>> -    if (ret == -EINTR)
>>>>>>>>>> -        ret = -ERESTARTSYS;
>>>>>>>>>> -    if (unlikely(ret != 0))
>>>>>>>>>> -        goto out_unlock;
>>>>>>>>>> -    dma_resv_unlock(bo->base.resv);
>>>>>>>>>> -
>>>>>>>>>> -out_unlock:
>>>>>>>>>> -    mutex_unlock(&bo->wu_mutex);
>>>>>>>>>> -    return ret;
>>>>>>>>>> -}
>>>>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>>>>>>> index 76eedb963693..505e1787aeea 100644
>>>>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>>>>>>> @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct
>>>>>>>>>> vm_fault *vmf)
>>>>>>>>>>               &bdev->man[bo->mem.mem_type];
>>>>>>>>>>           struct vm_area_struct cvma;
>>>>>>>>>>       -    /*
>>>>>>>>>> -     * Work around locking order reversal in fault / nopfn
>>>>>>>>>> -     * between mmap_sem and bo_reserve: Perform a trylock operation
>>>>>>>>>> -     * for reserve, and if it fails, retry the fault after waiting
>>>>>>>>>> -     * for the buffer to become unreserved.
>>>>>>>>>> -     */
>>>>>>>>>> -    if (unlikely(!dma_resv_trylock(bo->base.resv))) {
>>>>>>>>>> -        if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
>>>>>>>>>> -            if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
>>>>>>>>>> -                ttm_bo_get(bo);
>>>>>>>>>> - up_read(&vmf->vma->vm_mm->mmap_sem);
>>>>>>>>>> -                (void) ttm_bo_wait_unreserved(bo);
>>>>>>>>>> -                ttm_bo_put(bo);
>>>>>>>>>> -            }
>>>>>>>>>> -
>>>>>>>>>> -            return VM_FAULT_RETRY;
>>>>>>>>>> -        }
>>>>>>>>>> -
>>>>>>>>>> -        /*
>>>>>>>>>> -         * If we'd want to change locking order to
>>>>>>>>>> -         * mmap_sem -> bo::reserve, we'd use a blocking reserve here
>>>>>>>>>> -         * instead of retrying the fault...
>>>>>>>>>> -         */
>>>>>>>>> I think you should justify why the above code is removed, since the
>>>>>>>>> comments actually outlines what to do if we change locking order.
>>>>>>>>>
>>>>>>>>> The code that's removed above is not for adjusting locking orders but
>>>>>>>>> to decrease the mm latency by releasing the mmap_sem while waiting for
>>>>>>>>> bo reserve which in turn might be waiting for GPU. At a minimum we
>>>>>>>>> should have a separate patch with justification.
>>>>>>>>>
>>>>>>>>> Note that the caller here ensures locking progress by adjusting the
>>>>>>>>> RETRY flags after a retry.
>>>>>>> That would be patches 1&2 in this series.
>>>>>>>
>>>>>> Hmm? Those seem to touch only dma-buf and nouveau not ttm?  I mean this
>>>>>> patch should look along the lines of (based on an older tree) to
>>>>>> implement the new locking-order mmap_sem->reservation,
>>>>> Only nouveau was breaking was doing copy_*_user or get_user_pages
>>>>> while holding dma_resv locks, no one else. So nothing else needed to
>>>>> be changed. But patch 1 contains the full audit. I might have missed
>>>>> something.
>>>>>
>>>>>> but to keep the mm latency optimization using the RETRY functionality:
>>>>> Still no idea why this is needed? All the comments here and the code
>>>>> and history seem like they've been about the mmap_sem vs dma_resv
>>>>> inversion between driver ioctls and fault handling here. Once that's
>>>>> officially fixed there's no reason to play games here and retry loops
>>>>> - previously that was necessary because the old ttm_bo_vm_fault had a
>>>>> busy spin and that's definitely not nice. If it's needed I think it
>>>>> should be a second patch on top, to keep this all clear. I had to
>>>>> audit an enormous amount of code, I'd like to make sure I didn't miss
>>>>> anything before we start to make this super fancy again. Further
>>>>> patches on top is obviously all fine with me.
>>>>> -Daniel
>>>> Yes, but there are two different things you remove here. One is the
>>>> workaround for the locking reversal, which is obviously correct.
>>>>
>>>> One is TTM's implementation of the mmap_sem latency optimization, which
>>>> looks like an oversight.
>>>>
>>>> That optimization is why the VM_FAULT_RETRY functionality was added to
>>>> mm in the first place and is intended to be used when drivers expect a
>>>> substantial sleep to avoid keeping the pretty globalish mmap_sem held
>>>> while that sleep is taking place. We do exactly the same while waiting
>>>> for move-fences (ttm_bo_vm_fault_idle) and other drivers that expect
>>>> long waits in the fault handler do the same.
>>> Hm, are we guaranteed that core mm will only call us once with
>>> ALLOW_RETRY?
>> Last time I looked in the implementation, yes. The ALLOW_RETRY was there
>> to specifically
>> allow making progress in the locking.
>>
>>>    Just to make sure that we're not live-locking here. I'd
>>> also like to get rid of the wu_mutex, that just looks really strange
>>> (and I thought it was to duct-tape over the inversion, not as an
>>> optimization). If the livelock due to locking inversion is gone I have
>>> no idea anymore why we even needs the wu_mutex.
>> Yes, my interpretation of this is that wu_mutex definitely can be ditched.
> Ok I'll respin and just do normal interruptible locks, keeping the
> mmap_sem-less path. I think perfectly ok to leave the optimization in,
> as long as we can remove the wu_mutex.
>
> btw r-b/t-b on patch 1 from vmwgfx would be very much appreciated.
> That one is the real trick in this series here I think.

Thanks!

I'll look into patch 1 as well.

/Thomas

> -Daniel