[PATCH] drm/amdgpu: add VM update fences back to the root PD v2

Wed Feb 19 16:34:30 UTC 2020

For amd-staging-drm-next you need the first version of the patch.

For drm-misc-next or drm-next you need the second version of the patch.

We probably need to merge the patch through drm-misc-next anyway since 
there is also the patch which causes the problems.

Christian.

Am 19.02.20 um 16:47 schrieb Tom St Denis:
> The tip of origin/amd-staging-drm-next for me is:
>
> commit 7fd3b632e17e55c5ffd008f9f025754e7daa1b66
> Refs: {origin/amd-staging-drm-next}, v5.4-rc7-2847-g7fd3b632e17e
> Author:     Monk Liu <Monk.Liu at amd.com>
> AuthorDate: Thu Feb 6 23:55:58 2020 +0800
> Commit:     Monk Liu <Monk.Liu at amd.com>
> CommitDate: Wed Feb 19 13:33:02 2020 +0800
>
>     drm/amdgpu: fix colliding of preemption
>
>     what:
>     some os preemption path is messed up with world switch preemption
>
>     fix:
>     cleanup those logics so os preemption not mixed with world switch
>
>     this patch is a general fix for issues comes from SRIOV MCBP, but
>     there is still UMD side issues not resovlved yet, so this patch
>     cannot fix all world switch bug.
>
>     Signed-off-by: Monk Liu <Monk.Liu at amd.com>
>     Acked-by: Hawking Zhang <Hawking.Zhang at amd.com>
>
> Which I had fetched just an hour ago.
>
> On 2020-02-19 10:41 a.m., Christian König wrote:
>> Well what branch are you trying to merge that into?
>>
>> The conflict resolution should be simple, just keep the 
>> vm->update_funcs->prepare(...) line as it is in your branch.
>>
>> When you get those errors then something went wrong in your rebase.
>>
>> Christian.
>>
>> Am 19.02.20 um 16:14 schrieb Tom St Denis:
>>> Doesn't build even with conflict resolved:
>>>
>>> [root at raven linux]# make
>>>   CALL    scripts/checksyscalls.sh
>>>   CALL    scripts/atomic/check-atomics.sh
>>>   DESCEND  objtool
>>>   CHK     include/generated/compile.h
>>>   CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.o
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c: In function 
>>> ‘amdgpu_vm_bo_update_mapping’:
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1612:41: error: ‘owner’ 
>>> undeclared (first use in this function)
>>>  1612 |  r = vm->update_funcs->prepare(&params, owner, exclusive);
>>>       |                                         ^~~~~
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1612:41: note: each 
>>> undeclared identifier is reported only once for each function it 
>>> appears in
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1612:48: error: ‘exclusive’ 
>>> undeclared (first use in this function)
>>>  1612 |  r = vm->update_funcs->prepare(&params, owner, exclusive);
>>>       | ^~~~~~~~~
>>> make[4]: *** [scripts/Makefile.build:266: 
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.o] Error 1
>>> make[3]: *** [scripts/Makefile.build:509: 
>>> drivers/gpu/drm/amd/amdgpu] Error 2
>>> make[2]: *** [scripts/Makefile.build:509: drivers/gpu/drm] Error 2
>>> make[1]: *** [scripts/Makefile.build:509: drivers/gpu] Error 2
>>> make: *** [Makefile:1649: drivers] Error 2
>>>
>>> Should I just move to drm-misc-next?
>>>
>>> tom
>>>
>>> On 2020-02-19 10:02 a.m., Christian König wrote:
>>>> Add update fences to the root PD while mapping BOs.
>>>>
>>>> Otherwise PDs freed during the mapping won't wait for
>>>> updates to finish and can cause corruptions.
>>>>
>>>> v2: rebased on drm-misc-next
>>>>
>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>> Fixes: 90b69cdc5f159 drm/amdgpu: stop adding VM updates fences to 
>>>> the resv obj
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 14 ++++++++++++--
>>>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> index d16231d6a790..ef73fa94f357 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> @@ -588,8 +588,8 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
>>>>   {
>>>>       entry->priority = 0;
>>>>       entry->tv.bo = &vm->root.base.bo->tbo;
>>>> -    /* One for TTM and one for the CS job */
>>>> -    entry->tv.num_shared = 2;
>>>> +    /* Two for VM updates, one for TTM and one for the CS job */
>>>> +    entry->tv.num_shared = 4;
>>>>       entry->user_pages = NULL;
>>>>       list_add(&entry->tv.head, validated);
>>>>   }
>>>> @@ -1591,6 +1591,16 @@ static int 
>>>> amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>           goto error_unlock;
>>>>       }
>>>>   +    if (flags & AMDGPU_PTE_VALID) {
>>>> +        struct amdgpu_bo *root = vm->root.base.bo;
>>>> +
>>>> +        if (!dma_fence_is_signaled(vm->last_direct))
>>>> +            amdgpu_bo_fence(root, vm->last_direct, true);
>>>> +
>>>> +        if (!dma_fence_is_signaled(vm->last_delayed))
>>>> +            amdgpu_bo_fence(root, vm->last_delayed, true);
>>>> +    }
>>>> +
>>>>       r = vm->update_funcs->prepare(&params, owner, exclusive);
>>>>       if (r)
>>>>           goto error_unlock;
>>