[PATCH 1/2] drm//amdgpu: Always sync fence before unlock eviction_lock

Fri Mar 13 10:40:43 UTC 2020

> 2020年3月13日 18:23，Koenig, Christian <Christian.Koenig at amd.com> 写道：
> 
> Am 13.03.20 um 11:21 schrieb Pan, Xinhui:
>> 
>>> 2020年3月13日 17:55，Koenig, Christian <Christian.Koenig at amd.com> 写道：
>>> 
>>> Am 13.03.20 um 10:29 schrieb Pan, Xinhui:
>>>>> 2020年3月13日 16:52，Koenig, Christian <Christian.Koenig at amd.com> 写道：
>>>>> 
>>>>> Am 13.03.20 um 08:43 schrieb xinhui pan:
>>>>>> The fence generated in ->commit is a shared one, so add it to resv.
>>>>>> And we need do that with eviction lock hold.
>>>>>> 
>>>>>> Currently we only sync last_direct/last_delayed before ->prepare. But we
>>>>>> fail to sync the last fence generated by ->commit. That cuases problems
>>>>>> if eviction happenes later, but it does not sync the last fence.
>>>>> NAK, that won't work.
>>>>> 
>>>>> We can only add fences when the dma_resv object is locked and that is only the case when validating.
>>>>> 
>>>> well, tha tis true.
>>>> but considering this is a PT BO, and only eviction has race on it AFAIK.
>>>> as for the individualized resv in bo release, we unref PT BO just after that.
>>>> I am still thinking of other races in the real world.
>>> We should probably just add all pipelined/delayed submissions directly to the reservation object in amdgpu_vm_sdma_commit().
>>> 
>>> Only the direct and invalidating submissions can't be added because we can't grab the reservation object in the MMU notifier.

wait, I see amdgpu_vm_handle_fault will grab the resv lock of root BO.
so no race then?

>>> 
>>> Can you prepare a patch for this?
>>> 
>> yep, I can.
>> Adding fence to bo resv in every commit introduce a little overload?
> 
> Yes it does, but we used to have this before and it wasn't really measurable.
> 
> With the unusual exception of mapping really large chunks of fragmented system memory we only use one commit for anything <1GB anyway.
> 
> Christian.
> 
>> As we only need add the last fence to resv given the fact the job scheduer ring is FIFO.
>> yes,  code should be simple anyway as long as it works.
>> 
>> thanks
>> xinhui
>> 
>>> Regards,
>>> Christian.
>>> 
>>>> thanks
>>>> xinhui
>>>> 
>>>>> I'm considering to just partially revert the patch originally stopping to add fences and instead only not add them when invalidating in a direct submit.
>>>>> 
>>>>> Christian.
>>>>> 
>>>>>> Cc: Christian König <christian.koenig at amd.com>
>>>>>> Cc: Alex Deucher <alexander.deucher at amd.com>
>>>>>> Cc: Felix Kuehling <Felix.Kuehling at amd.com>
>>>>>> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
>>>>>> ---
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++--
>>>>>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>> 
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>> index 73398831196f..f424b5969930 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>> @@ -1582,6 +1582,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>>>  	struct amdgpu_vm_update_params params;
>>>>>>  	enum amdgpu_sync_mode sync_mode;
>>>>>>  	int r;
>>>>>> +	struct amdgpu_bo *root = vm->root.base.bo;
>>>>>>    	memset(&params, 0, sizeof(params));
>>>>>>  	params.adev = adev;
>>>>>> @@ -1604,8 +1605,6 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>>>  	}
>>>>>>    	if (flags & AMDGPU_PTE_VALID) {
>>>>>> -		struct amdgpu_bo *root = vm->root.base.bo;
>>>>>> -
>>>>>>  		if (!dma_fence_is_signaled(vm->last_direct))
>>>>>>  			amdgpu_bo_fence(root, vm->last_direct, true);
>>>>>>  @@ -1623,6 +1622,12 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>>>    	r = vm->update_funcs->commit(&params, fence);
>>>>>>  +	if (!dma_fence_is_signaled(vm->last_direct))
>>>>>> +		amdgpu_bo_fence(root, vm->last_direct, true);
>>>>>> +
>>>>>> +	if (!dma_fence_is_signaled(vm->last_delayed))
>>>>>> +		amdgpu_bo_fence(root, vm->last_delayed, true);
>>>>>> +
>>>>>>  error_unlock:
>>>>>>  	amdgpu_vm_eviction_unlock(vm);
>>>>>>  	return r;
>