[PATCH] drm/amdgpu: Pin MMIO/DOORBELL BO's in GTT domain

Errabolu, Ramesh Ramesh.Errabolu at amd.com
Tue Nov 9 05:13:14 UTC 2021


[AMD Official Use Only]

Responses in line

Regards,
Ramesh

-----Original Message-----
From: Kuehling, Felix <Felix.Kuehling at amd.com> 
Sent: Monday, November 8, 2021 10:51 PM
To: amd-gfx at lists.freedesktop.org; Errabolu, Ramesh <Ramesh.Errabolu at amd.com>
Subject: Re: [PATCH] drm/amdgpu: Pin MMIO/DOORBELL BO's in GTT domain

Am 2021-11-08 um 8:37 p.m. schrieb Ramesh Errabolu:
> MMIO/DOORBELL BOs encode control data and should be pinned in GTT 
> domain before enabling PCIe connected peer devices in accessing it

The PCIe connected peer device access isn't an issue on the upstream branch (yet). But in general, it is a good idea to pin these SG BOs.
There is no good reason to have them on the TTM LRU list. And they do reference fixed physical addresses.

See one more comment inline.


>
> Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    | 25 +++++++++
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 55 
> +++++++++++++++++++
>  2 files changed, 80 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 4c1d6536a7a5..d9a1cfd7876f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -300,6 +300,31 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct amdgpu_device *adev,
>  				      uint64_t va, void *drm_priv,
>  				      struct kgd_mem **mem, uint64_t *size,
>  				      uint64_t *mmap_offset);
> +
> +/**
> + * amdgpu_amdkfd_gpuvm_pin_bo() - Pins a BO using following criteria
> + * @bo: Handle of buffer object being pinned
> + * @domain: Domain into which BO should be pinned
> + *
> + *   - USERPTR BOs are UNPINNABLE and will return error
> + *   - All other BO types (GTT, VRAM, MMIO and DOORBELL) will have their
> + *     PIN count incremented. It is valid to PIN a BO multiple times
> + *
> + * Return: ZERO if successful in pinning, Non-Zero in case of error.
> + * Will return -EINVAL if input BO parameter is a USERPTR type.
> + */
> +int amdgpu_amdkfd_gpuvm_pin_bo(struct amdgpu_bo *bo, u32 domain);
> +
> +/**
> + * amdgpu_amdkfd_gpuvm_unpin_bo() - Unpins BO using following 
> +criteria
> + * @bo: Handle of buffer object being unpinned
> + *
> + *   - Is a illegal request for USERPTR BOs and is ignored
> + *   - All other BO types (GTT, VRAM, MMIO and DOORBELL) will have their
> + *     PIN count decremented. Calls to UNPIN must balance calls to PIN
> + */
> +void amdgpu_amdkfd_gpuvm_unpin_bo(struct amdgpu_bo *bo);
> +

These declarations aren't needed here. The functions should be static because they are only used in the same source file.

Ramesh: Removed them from the header file


>  int amdgpu_amdkfd_get_tile_config(struct amdgpu_device *adev,
>  				struct tile_config *config);
>  void amdgpu_amdkfd_ras_poison_consumption_handler(struct 
> amdgpu_device *adev); diff --git 
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 4fa814358552..f4ffc41873dd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1299,6 +1299,36 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void **process_info,
>  	return ret;
>  }
>  
> +int amdgpu_amdkfd_gpuvm_pin_bo(struct amdgpu_bo *bo, u32 domain)

Static

Ramesh: Updated function signature to make it static


> +{
> +	int ret = 0;
> +
> +	ret = amdgpu_bo_reserve(bo, false);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	ret = amdgpu_bo_pin_restricted(bo, domain, 0, 0);
> +	if (ret)
> +		pr_err("Error in Pinning BO to domain: %d\n", domain);
> +
> +	amdgpu_bo_sync_wait(bo, AMDGPU_FENCE_OWNER_KFD, false);
> +	amdgpu_bo_unreserve(bo);
> +
> +	return ret;
> +}
> +
> +void amdgpu_amdkfd_gpuvm_unpin_bo(struct amdgpu_bo *bo)

static

Ramesh: Same as above

> +{
> +	int ret = 0;
> +
> +	ret = amdgpu_bo_reserve(bo, false);
> +	if (unlikely(ret))
> +		return;
> +
> +	amdgpu_bo_unpin(bo);
> +	amdgpu_bo_unreserve(bo);
> +}
> +
>  int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct amdgpu_device *adev,
>  					   struct file *filp, u32 pasid,
>  					   void **process_info,
> @@ -1525,6 +1555,23 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>  	if (offset)
>  		*offset = amdgpu_bo_mmap_offset(bo);
>  
> +	if (flags & (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
> +			KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
> +		ret = amdgpu_amdkfd_bo_validate(bo, AMDGPU_GEM_DOMAIN_GTT, false);
> +		if (ret) {
> +			pr_err("Validating MMIO/DOORBELL BO during ALLOC FAILED\n");
> +			goto err_node_allow;

Actually, I think this is wrong. You need a new label before drm_vma_node_revoke to make sure the entry in the node permission structure is not leaked.

Ramesh: Good catch - using a new label - "err_pin_bo"

> +		}
> +
> +		ret = amdgpu_amdkfd_gpuvm_pin_bo(bo, AMDGPU_GEM_DOMAIN_GTT);
> +		if (ret) {
> +			pr_err("Pinning MMIO/DOORBELL BO during ALLOC FAILED\n");
> +			goto err_node_allow;

Same as above.

Regards,
  Felix


> +		}
> +		bo->allowed_domains = AMDGPU_GEM_DOMAIN_GTT;
> +		bo->preferred_domains = AMDGPU_GEM_DOMAIN_GTT;
> +	}
> +
>  	return 0;
>  
>  allocate_init_user_pages_failed:
> @@ -1561,6 +1608,14 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>  	bool is_imported = false;
>  
>  	mutex_lock(&mem->lock);
> +
> +	/* Unpin MMIO/DOORBELL BO's that were pinnned during allocation */
> +	if (mem->alloc_flags &
> +	    (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
> +	     KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
> +		amdgpu_amdkfd_gpuvm_unpin_bo(mem->bo);
> +	}
> +
>  	mapped_to_gpu_memory = mem->mapped_to_gpu_memory;
>  	is_imported = mem->is_imported;
>  	mutex_unlock(&mem->lock);


More information about the amd-gfx mailing list