[PATCH] drm/amdgpu: Let userptr BO ttm have TTM_PAGE_FLAG_SG set
Felix Kuehling
felix.kuehling at amd.com
Thu May 20 03:37:27 UTC 2021
I think this works for KFD userptr BOs. But this problem is probably not
specific to KFD. It's only most obvious with KFD because we rely so
heavily for userptrs.
I don't really understand why we're messing with TTM_PAGE_FLAG_SG in
amdgpu_ttm_tt_populate and amdgpu_ttm_tt_unpopulate. And why are userptr
BOs created as ttm_bo_type_device, not ttm_bo_type_sg? Christian, do you
know about the history of this code?
Either way, the patch is
Acked-by: Felix Kuehling <Felix.Kuehling at amd.com>
Thanks for looking into this!
Regards,
Felix
Am 2021-05-19 um 11:15 p.m. schrieb xinhui pan:
> We have met memory corruption due to unexcepted swapout/swapin.
>
> swapout function create one swap storage which is filled with zero. And
> set ttm->page_flags as TTM_PAGE_FLAG_SWAPPED. But because userptr BO ttm
> has no backend page at that time, no real data is swapout to swap
> storage.
>
> swapin function is called during userptr BO populate as
> TTM_PAGE_FLAG_SWAPPED is set. Now here is the problem, we swapin data to
> ttm bakend memory from swap storage. That just causes the memory been
> overwritten.
>
> CPU 1 CPU 2
> kfd alloc BO A(userptr) alloc BO B(GTT)
> ->init -> validate(create ttm) -> init -> validate -> populate
> init_user_pages -> swapout BO A
> -> get_user_pages (fill up ttm->pages)
> -> validate -> populate
> -> swapin BO A // memory overwritten
>
> To fix this issue, we can set TTM_PAGE_FLAG_SG when we create userptr BO
> ttm. Then swapout function would not swap it.
>
> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 +---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 ++++
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 928e8d57cd08..9a6ea966ddb2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1410,7 +1410,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
> } else if (flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
> domain = AMDGPU_GEM_DOMAIN_GTT;
> alloc_domain = AMDGPU_GEM_DOMAIN_CPU;
> - alloc_flags = 0;
> + alloc_flags = AMDGPU_AMDKFD_CREATE_USERPTR_BO;
> if (!offset || !*offset)
> return -EINVAL;
> user_addr = untagged_addr(*offset);
> @@ -1477,8 +1477,6 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
> }
> bo->kfd_bo = *mem;
> (*mem)->bo = bo;
> - if (user_addr)
> - bo->flags |= AMDGPU_AMDKFD_CREATE_USERPTR_BO;
>
> (*mem)->va = va;
> (*mem)->domain = domain;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index c7f5cc503601..5b3f45637fb5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1119,6 +1119,10 @@ static struct ttm_tt *amdgpu_ttm_tt_create(struct ttm_buffer_object *bo,
> kfree(gtt);
> return NULL;
> }
> +
> + if (abo->flags & AMDGPU_AMDKFD_CREATE_USERPTR_BO)
> + gtt->ttm.page_flags |= TTM_PAGE_FLAG_SG;
> +
> return >t->ttm;
> }
>
More information about the amd-gfx
mailing list