[PATCH 3/3] drm/amdkfd: classify and map mixed svm range pages in GPU

Felix Kuehling felix.kuehling at amd.com
Thu May 6 00:48:47 UTC 2021


Am 2021-05-05 um 6:59 p.m. schrieb Alex Sierra:
> [Why]
> svm ranges can have mixed pages from device or system memory.
> A good example is, after a prange has been allocated in VRAM and a
> copy-on-write is triggered by a fork. This invalidates some pages
> inside the prange. Endding up in mixed pages.
>
> [How]
> By classifying each page inside a prange, based on its type. Device or
> system memory, during dma mapping call. If page corresponds
> to VRAM domain, a flag is set to its dma_addr entry for each GPU.
> Then, at the GPU page table mapping. All group of contiguous pages within
> the same type are mapped with their proper pte flags.
>
> Signed-off-by: Alex Sierra <alex.sierra at amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 58 +++++++++++++++++++---------
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  2 +
>  2 files changed, 42 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 7104762a0119..94fe9ab70e94 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -141,6 +141,12 @@ svm_range_dma_map_dev(struct device *dev, dma_addr_t **dma_addr,
>  			dma_unmap_page(dev, addr[i], PAGE_SIZE, dir);
>  
>  		page = hmm_pfn_to_page(hmm_pfns[i]);
> +		if (is_zone_device_page(page)) {
> +			addr[i] = hmm_pfns[i] << PAGE_SHIFT;

This is not the real physical address. It's a fake address from
devm_request_free_mem_region. You need to translate that to an actual
physical address by subtracting pgmap->range.start and adding
adev->vm_manager.vram_base_offset.

Right now it looks like you don't use this address for VRAM at all. You
calculate the address from the ttm_res in the svm_range. This works OK
as long as we do the VRAM allocation per SVM range. Eventually it may be
useful to calculate the correct address here from the PFN. Then we can
be more flexible in the VRAM management and may not need a reference to
the BO in the svm_range at all any more.


> +			addr[i] |= SVM_RANGE_VRAM_DOMAIN;
> +			pr_debug("vram address detected: 0x%llx\n", addr[i] >> PAGE_SHIFT);
> +			continue;
> +		}
>  		addr[i] = dma_map_page(dev, page, 0, PAGE_SIZE, dir);
>  		r = dma_mapping_error(dev, addr[i]);
>  		if (r) {
> @@ -1131,33 +1137,49 @@ svm_range_map_to_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>  		     struct amdgpu_device *bo_adev, struct dma_fence **fence)
>  {
>  	struct amdgpu_bo_va bo_va;
> +	struct ttm_resource *ttm_res;
>  	uint64_t pte_flags;
> +	unsigned long last_start;
> +	int last_domain;
>  	int r = 0;
> +	int64_t i, j;
>  
>  	pr_debug("svms 0x%p [0x%lx 0x%lx]\n", prange->svms, prange->start,
>  		 prange->last);
>  
> -	if (prange->svm_bo && prange->ttm_res) {
> +	if (prange->svm_bo && prange->ttm_res)
>  		bo_va.is_xgmi = amdgpu_xgmi_same_hive(adev, bo_adev);
> -		prange->mapping.bo_va = &bo_va;
> -	}
> -
> -	prange->mapping.start = prange->start;
> -	prange->mapping.last = prange->last;
> -	prange->mapping.offset = prange->offset;
> -	pte_flags = svm_range_get_pte_flags(adev, prange);
>  
> -	r = amdgpu_vm_bo_update_mapping(adev, bo_adev, vm, false, false, NULL,
> -					prange->mapping.start,
> -					prange->mapping.last, pte_flags,
> -					prange->mapping.offset,
> -					prange->ttm_res ?
> -						prange->ttm_res->mm_node : NULL,
> -					dma_addr, &vm->last_update);
> -	if (r) {
> -		pr_debug("failed %d to map to gpu 0x%lx\n", r, prange->start);
> -		goto out;
> +	ttm_res = prange->ttm_res;

This is not needed here. See below.


> +	last_start = prange->start;
> +	for (i = j = 0; i < prange->npages; i++) {
> +		last_domain = dma_addr[i] & SVM_RANGE_VRAM_DOMAIN;
> +		if ((last_start + j) < prange->last &&

I think this should be true: prange->start + i == last_start + j. So you
don't need j.


> +		    last_domain == (dma_addr[i + 1] & SVM_RANGE_VRAM_DOMAIN)) {
> +			j++;
> +			continue;
> +		}
> +		pr_debug("Mapping range [0x%lx 0x%llx] on domain: %s\n",
> +			 last_start, last_start + j, last_domain ? "GPU" : "CPU");
> +		prange->ttm_res = last_domain ? ttm_res : NULL;

You're using prange->ttm_res as a temporary variable. Use the local
variable ttm_res as your temporary. Then this becomes:

    ttm_res = last_domain ? prange->ttm_res : NULL;

Then use ttm_res instead of prange->ttm_res below.


> +		pte_flags = svm_range_get_pte_flags(adev, prange);
> +		r = amdgpu_vm_bo_update_mapping(adev, bo_adev, vm, false, false, NULL,
> +						last_start,
> +						last_start + j, pte_flags,

This could also use prange->start + i.


> +						((last_start - prange->start) << PAGE_SHIFT),

I think you still need to add prange->offset here. Otherwise you break
the case where one BO is shared by several ranges.


> +						prange->ttm_res ?
> +							prange->ttm_res->mm_node : NULL,

This looks like you need to rebase on Christian's latest changes.
amdgpu_vm_bo_update_mapping now takes a ttm_res parameter.


> +						prange->ttm_res ? NULL :
> +						dma_addr,

This line-break is really jarring. Put dma_addr on the same line as the
rest of the ternary expression.


> +						&vm->last_update);
> +		if (r) {
> +			pr_debug("failed %d to map to gpu 0x%lx\n", r, prange->start);
> +			goto out;
> +		}
> +		last_start += j + 1;

last_start = prange->start + i + 1;


> +		j = 0;
>  	}
> +	prange->ttm_res = ttm_res;

Remove this. See my comment about making ttm_res the temporary variable.
We should not misuse our data structures for temporary variables.

Regards,
  Felix


>  
>  	r = amdgpu_vm_update_pdes(adev, vm, false);
>  	if (r) {
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
> index 573f984b81fe..bb85288cd658 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
> @@ -35,6 +35,8 @@
>  #include "amdgpu.h"
>  #include "kfd_priv.h"
>  
> +#define SVM_RANGE_VRAM_DOMAIN (1UL << 0)
> +
>  struct svm_range_bo {
>  	struct amdgpu_bo		*bo;
>  	struct kref			kref;


More information about the amd-gfx mailing list