[PATCH] drm/amdgpu: fix and cleanup amdgpu_gem_object_close

Christian König christian.koenig at amd.com
Thu Mar 12 15:22:37 UTC 2020


Am 12.03.20 um 16:03 schrieb Liu, Monk:
>>>> The problem is that dma_resv_test_signaled_rcu() tests only the shared fence if one is present.
> Okay I got the point now, but why we cannot modify dma_resv_test_signaled_rcu() to let it wait for both exclusive and shared lists ?

That is exactly what I and Xinhui said as well and what we also both 
proposed in patches, but the Intel guys are against that.

I also already proposed an extension to the dma_resv infrastructure 
where you get a list of "other" fences for stuff like this, but it 
wasn't to well received either and I can't dedicate more time to this.

Regards,
Christian.

>
>
> Ack-by: Monk Liu <monk.liu at amd.com>
> _____________________________________
> Monk Liu|GPU Virtualization Team |AMD
>
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig at amd.com>
> Sent: Thursday, March 12, 2020 9:42 PM
> To: Liu, Monk <Monk.Liu at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: fix and cleanup amdgpu_gem_object_close
>
> The problem is that dma_resv_test_signaled_rcu() tests only the shared fence if one is present.
>
> Now what happened is that the clear fence completed before the exclusive one, and that in turn caused TTM to think that the BO is unused and freed it.
>
> Christian.
>
> Am 12.03.20 um 14:25 schrieb Liu, Monk:
>> without your patch, the clear fence is also hooked in the shared list
>> of bo's reservation obj,  no matter the exclusive fence of that BO
>> signaled before clear fence or not
>>
>> since the clear fence is always kept in the bo's resv object, can you tell me what's the problem than ? are we going to loose this clear fence and still use it during the  VM pt/pd clearing ?
>>
>> thanks
>> _____________________________________
>> Monk Liu|GPU Virtualization Team |AMD
>>
>>
>> -----Original Message-----
>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>> Sent: Thursday, March 12, 2020 8:50 PM
>> To: Liu, Monk <Monk.Liu at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>;
>> amd-gfx at lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: fix and cleanup
>> amdgpu_gem_object_close
>>
>>    From the semantic the dma_resv object contains a single exclusive and multiple shared fences and it is mandatory that the shared fences complete after the exclusive one.
>>
>> Now what happens is that clearing the VM page tables runs asynchronously to the exclusive fence which moves the buffer around.
>>
>> And because of this Xinhui ran into a rare problem that TTM thought it could reuse the memory while in reality it was still in use.
>>
>> Regards,
>> Christian.
>>
>> Am 12.03.20 um 12:30 schrieb Liu, Monk:
>>> Can you give more details about " we can't guarantee the the clear fence will complete after the exclusive one." ?
>>>
>>> Thanks
>>>
>>> _____________________________________
>>> Monk Liu|GPU Virtualization Team |AMD
>>>
>>>
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>>> Christian K?nig
>>> Sent: Thursday, March 12, 2020 7:12 PM
>>> To: Pan, Xinhui <Xinhui.Pan at amd.com>; amd-gfx at lists.freedesktop.org
>>> Subject: [PATCH] drm/amdgpu: fix and cleanup amdgpu_gem_object_close
>>>
>>> The problem is that we can't add the clear fence to the BO when there is an exclusive fence on it since we can't guarantee the the clear fence will complete after the exclusive one.
>>>
>>> To fix this refactor the function and wait for any potential exclusive fence with a small timeout before adding the shared clear fence.
>>>
>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 41 +++++++++++++++----------
>>>     1 file changed, 25 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> index 4277125a79ee..0782162fb5f3 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> @@ -161,10 +161,11 @@ void amdgpu_gem_object_close(struct
>>> drm_gem_object *obj,
>>>     
>>>     	struct amdgpu_bo_list_entry vm_pd;
>>>     	struct list_head list, duplicates;
>>> +	struct dma_fence *fence = NULL;
>>>     	struct ttm_validate_buffer tv;
>>>     	struct ww_acquire_ctx ticket;
>>>     	struct amdgpu_bo_va *bo_va;
>>> -	int r;
>>> +	long r;
>>>     
>>>     	INIT_LIST_HEAD(&list);
>>>     	INIT_LIST_HEAD(&duplicates);
>>> @@ -182,24 +183,32 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj,
>>>     		return;
>>>     	}
>>>     	bo_va = amdgpu_vm_bo_find(vm, bo);
>>> -	if (bo_va && --bo_va->ref_count == 0) {
>>> -		amdgpu_vm_bo_rmv(adev, bo_va);
>>> +	if (!bo_va || --bo_va->ref_count)
>>> +		goto out_unlock;
>>>     
>>> -		if (amdgpu_vm_ready(vm)) {
>>> -			struct dma_fence *fence = NULL;
>>> +	amdgpu_vm_bo_rmv(adev, bo_va);
>>> +	if (!amdgpu_vm_ready(vm))
>>> +		goto out_unlock;
>>>     
>>> -			r = amdgpu_vm_clear_freed(adev, vm, &fence);
>>> -			if (unlikely(r)) {
>>> -				dev_err(adev->dev, "failed to clear page "
>>> -					"tables on GEM object close (%d)\n", r);
>>> -			}
>>>     
>>> -			if (fence) {
>>> -				amdgpu_bo_fence(bo, fence, true);
>>> -				dma_fence_put(fence);
>>> -			}
>>> -		}
>>> -	}
>>> +	r = amdgpu_vm_clear_freed(adev, vm, &fence);
>>> +	if (r || !fence)
>>> +		goto out_unlock;
>>> +
>>> +	r = dma_resv_wait_timeout_rcu(bo->tbo.base.resv, false, false,
>>> +				      msecs_to_jiffies(10));
>>> +	if (r == 0)
>>> +		r = -ETIMEDOUT;
>>> +	if (r)
>>> +		goto out_unlock;
>>> +
>>> +	amdgpu_bo_fence(bo, fence, true);
>>> +	dma_fence_put(fence);
>>> +
>>> +out_unlock:
>>> +	if (unlikely(r))
>>> +		dev_err(adev->dev, "failed to clear page "
>>> +			"tables on GEM object close (%d)\n", r);
>>>     	ttm_eu_backoff_reservation(&ticket, &list);  }
>>>     
>>> --
>>> 2.17.1
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
>>> t
>>> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7CM
>>> o
>>> nk.Liu%40amd.com%7C26730e56c5b944f8cbb408d7c683d4a1%7C3dd8961fe4884e6
>>> 0
>>> 8e11a82d994e183d%7C0%7C0%7C637196141815929915&sdata=yP5Yc1BWYWS93
>>> f
>>> 0hHERUfECmShwyQ5fbMkhCeG82n6M%3D&reserved=0



More information about the amd-gfx mailing list