[PATCH 9/9] drm/amdgpu: Lock the attached dmabuf in unpopulate
Felix Kuehling
felix.kuehling at amd.com
Thu Apr 15 14:29:36 UTC 2021
Am 2021-04-15 um 3:41 a.m. schrieb Christian König:
>
>
> Am 14.04.21 um 17:15 schrieb Felix Kuehling:
>> Am 2021-04-14 um 3:33 a.m. schrieb Christian König:
>>> Am 14.04.21 um 08:46 schrieb Felix Kuehling:
>>>> amdgpu_ttm_tt_unpopulate can be called during bo_destroy. The
>>>> dmabuf->resv
>>>> must not be held by the caller or dma_buf_detach will deadlock.
>>>> This is
>>>> probably not the right fix. I get a recursive lock warning with the
>>>> reservation held in ttm_bo_release. Should unmap_attachment move to
>>>> backend_unbind instead?
>>> Yes probably, but I'm really wondering if we should call unpopulate
>>> without holding the reservation lock.
>> There is an error handling code path in ttm_tt_populate that calls
>> unpopulate.
>
> That should be harmless. For populating the page array we need the
> same lock as for unpopulating it.
>
>> I believe that has to be holding the reservation lock.
>
> Correct, yes.
>
>> The other cases (destroy and swapout) do not hold the lock, AIUI.
>
> That's not correct. See ttm_bo_release() for example:
>
> ...
> if (!dma_resv_test_signaled_rcu(bo->base.resv, true) ||
> !dma_resv_trylock(bo->base.resv)) {
> ...
>
> We intentionally lock the reservation object here or put it on the
> delayed delete list because dropping the tt object without holding the
> lock is illegal for multiple reasons.
I think this is because I manually individualized the reservation in
patch 4. Without that I was running into different problems (probably
need to dig a bit more to understand what's happening there). So the
lock held by release is not the same as the lock of the original dmabuf.
Regards,
Felix
>
> If you run into an unpopulate which doesn't hold the lock then I
> really need that backtrace because we are running into a much larger
> bug here.
>
> Thanks,
> Christian.
>
>
>>
>> Regards,
>> Felix
>>
>>
>>> Christian.
>>>
>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 13 +++++++++++++
>>>> 1 file changed, 13 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> index 936b3cfdde55..257750921eed 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> @@ -1216,9 +1216,22 @@ static void amdgpu_ttm_tt_unpopulate(struct
>>>> ttm_device *bdev,
>>>> if (ttm->sg && gtt->gobj->import_attach) {
>>>> struct dma_buf_attachment *attach;
>>>> + bool locked;
>>>> attach = gtt->gobj->import_attach;
>>>> + /* FIXME: unpopulate can be called during bo_destroy.
>>>> + * The dmabuf->resv must not be held by the caller or
>>>> + * dma_buf_detach will deadlock. This is probably not
>>>> + * the right fix. I get a recursive lock warning with the
>>>> + * reservation held in ttm_bo_releas.. Should
>>>> + * unmap_attachment move to backend_unbind instead?
>>>> + */
>>>> + locked = dma_resv_is_locked(attach->dmabuf->resv);
>>>> + if (!locked)
>>>> + dma_resv_lock(attach->dmabuf->resv, NULL);
>>>> dma_buf_unmap_attachment(attach, ttm->sg,
>>>> DMA_BIDIRECTIONAL);
>>>> + if (!locked)
>>>> + dma_resv_unlock(attach->dmabuf->resv);
>>>> ttm->sg = NULL;
>>>> return;
>>>> }
>
More information about the amd-gfx
mailing list