[PATCH] drm/amdgpu: fix VM sync with always valid BOs
Felix Kuehling
felix.kuehling at amd.com
Fri Sep 8 21:12:24 UTC 2017
I think this change could use a little more explanation. I also have
some ideas for simplifying the code and making it easier to understand.
See comments inline.
But it looks good to me otherwise. Reviewed-by: Felix Kuehling
<Felix.Kuehling at amd.com>
Regards,
Felix
On 2017-09-08 08:18 AM, Christian König wrote:
> From: Christian König <christian.koenig at amd.com>
>
> All users of a VM must always wait for updates with always
> valid BOs to be completed.
>
> Signed-off-by: Christian König <christian.koenig at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 14 ++++++++++----
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 15 ++++++++++-----
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 +-
> 3 files changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 8aa37e0..d6e66b7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -752,10 +752,6 @@ static int amdgpu_bo_vm_update_pte(struct amdgpu_cs_parser *p)
> if (r)
> return r;
>
> - r = amdgpu_sync_fence(adev, &p->job->sync, vm->last_dir_update);
> - if (r)
> - return r;
> -
> r = amdgpu_vm_clear_freed(adev, vm, NULL);
> if (r)
> return r;
> @@ -797,6 +793,10 @@ static int amdgpu_bo_vm_update_pte(struct amdgpu_cs_parser *p)
> if (bo_va == NULL)
> continue;
>
> + if (bo_va->base.bo->flags &
> + AMDGPU_GEM_CREATE_VM_ALWAYS_VALID)
> + continue;
This looks like somewhat unrelated. If you don't make it a separate
patch, at least mention it in the change description. Something like:
Always-valid BOs' page table entries are updated by
amdgpu_vm_handle_moved. Skip them in amdgpu_bo_vm_update_pte.
> +
> r = amdgpu_vm_bo_update(adev, bo_va, false);
> if (r)
> return r;
> @@ -810,6 +810,12 @@ static int amdgpu_bo_vm_update_pte(struct amdgpu_cs_parser *p)
> }
>
> r = amdgpu_vm_handle_moved(adev, vm, &p->job->sync);
> + if (r)
> + return r;
> +
> + r = amdgpu_sync_fence(adev, &p->job->sync, vm->mandatory_update);
> + if (r)
> + return r;
>
> if (amdgpu_vm_debug && p->bo_list) {
> /* Invalidate all BOs to test for userspace bugs */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 55f1ecb..12c8a4c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1140,9 +1140,8 @@ static int amdgpu_vm_update_level(struct amdgpu_device *adev,
> goto error_free;
>
> amdgpu_bo_fence(parent->base.bo, fence, true);
> - dma_fence_put(vm->last_dir_update);
> - vm->last_dir_update = dma_fence_get(fence);
> - dma_fence_put(fence);
> + dma_fence_put(vm->mandatory_update);
> + vm->mandatory_update = fence;
> }
> }
>
> @@ -1803,6 +1802,12 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev,
> trace_amdgpu_vm_bo_mapping(mapping);
> }
>
> + if (bo_va->base.bo &&
> + bo_va->base.bo->tbo.resv == vm->root.base.bo->tbo.resv) {
> + dma_fence_put(vm->mandatory_update);
> + vm->mandatory_update = dma_fence_get(bo_va->last_pt_update);
> + }
> +
A suggestion: This could be handled with fewer cryptic conditions
littered throughout the code by passing a struct dma_fence **fence
parameter to amdgpu_vm_bo_update. When this is called for updating VM
BOs (page tables or always valid) in handle_moved, pass
vm->mandatory_update as the fence pointer.
> return 0;
> }
>
> @@ -2586,7 +2591,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
> vm->use_cpu_for_update ? "CPU" : "SDMA");
> WARN_ONCE((vm->use_cpu_for_update & !amdgpu_vm_is_large_bar(adev)),
> "CPU update of VM recommended only for large BAR system\n");
> - vm->last_dir_update = NULL;
> + vm->mandatory_update = NULL;
>
> flags = AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
> AMDGPU_GEM_CREATE_VRAM_CLEARED;
> @@ -2692,7 +2697,7 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
> }
>
> amdgpu_vm_free_levels(&vm->root);
> - dma_fence_put(vm->last_dir_update);
> + dma_fence_put(vm->mandatory_update);
> for (i = 0; i < AMDGPU_MAX_VMHUBS; i++)
> amdgpu_vm_free_reserved_vmid(adev, vm, i);
> }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index c1accd1..63fa2e5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -140,7 +140,7 @@ struct amdgpu_vm {
>
> /* contains the page directory */
> struct amdgpu_vm_pt root;
> - struct dma_fence *last_dir_update;
> + struct dma_fence *mandatory_update;
I'm not sure about this renaming. I don't find the new name clearer than
the old one. What's mandatory? The update, or syncing with the fence? I
think the name should reflect whatever completes when the fence signals.
I can see that last_dir_update is no longer accurate because it also
includes other VM-local (always valid) BOs. Maybe last_vm_update?
I'm thinking with the fence pointer idea above, maybe you could get rid
of this completely. Just return a fence from each amdgpu_vm_bo_update
call and also return that fence from amdgpu_vm_handle_moved. Then
amdgpu_bo_vm_update_pte can treat the fences from regular BOs, always
valid BOs and page tables all in the same way.
>
> /* protecting freed */
> spinlock_t freed_lock;
More information about the amd-gfx
mailing list