[PATCH 1/2] drm/amdgpu: Acquire ttm locks for dmaunmap

Felix Kuehling felix.kuehling at amd.com
Wed Oct 11 20:31:50 UTC 2023


On 2023-10-11 14:22, David Francis wrote:
> dmaunmap can call ttm_bo_validate, which expects the
> ttm dma_resv to be held.
>
> Acquire the locks in amdgpu_amdkfd_gpuvm_dmaunmap_mem.
>
> Signed-off-by: David Francis <David.Francis at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h       |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 +++++++++++-
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         |  7 ++++++-
>   3 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 3ad8dc523b42..dba4f6b7a2f9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -302,7 +302,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(struct amdgpu_device *adev,
>   					  struct kgd_mem *mem, void *drm_priv);
>   int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
>   		struct amdgpu_device *adev, struct kgd_mem *mem, void *drm_priv);
> -void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv);
> +int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv);
>   int amdgpu_amdkfd_gpuvm_sync_memory(
>   		struct amdgpu_device *adev, struct kgd_mem *mem, bool intr);
>   int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_mem *mem,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index a15e59abe70a..808deec8aa58 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -2094,21 +2094,31 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>   	return ret;
>   }
>   
> -void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv)
> +int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv)
>   {
>   	struct kfd_mem_attachment *entry;
>   	struct amdgpu_vm *vm;
> +	struct bo_vm_reservation_context ctx;
> +	int ret;
>   
>   	vm = drm_priv_to_vm(drm_priv);
>   
>   	mutex_lock(&mem->lock);
>   
> +	ret = reserve_bo_and_cond_vms(mem, vm, BO_VM_MAPPED, &ctx);

Looks like you copied this from somewhere else. Do you really need to 
reserve the VM here, or just the BO? If you only need to reserve the BO, 
just use amdgpu_bo_reserve.


> +	if (ret)
> +		goto out;
> +
>   	list_for_each_entry(entry, &mem->attachments, list) {
>   		if (entry->bo_va->base.vm == vm)
>   			kfd_mem_dmaunmap_attachment(mem, entry);
>   	}
>   
> +	unreserve_bo_and_vms(&ctx, false, false);
> +
> +out:
>   	mutex_unlock(&mem->lock);
> +	return ret;
>   }
>   
>   int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 06988cf1db51..21d4e7d46238 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1442,7 +1442,11 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
>   			kfd_flush_tlb(peer_pdd, TLB_FLUSH_HEAVYWEIGHT);
>   
>   		/* Remove dma mapping after tlb flush to avoid IO_PAGE_FAULT */
> -		amdgpu_amdkfd_gpuvm_dmaunmap_mem(mem, peer_pdd->drm_priv);
> +		err = amdgpu_amdkfd_gpuvm_dmaunmap_mem(mem, peer_pdd->drm_priv);
> +		if (err) {

This is probably not correct. The only failure case I know of is, that 
the reservation is interrupted by a signal and returns -ERESTARTSYS. 
When this happens, the ioctl will automatically get restarted after 
handling the signal. We need to make sure that we handle this restart 
case correctly. When this happens, all the unmapping is already done 
from the first time the function ran. All we need to repeat is the dmaunmap.

We already handled this scenario for amdgpu_amdkfd_gpuvm_sync_memory. 
Make sure it also works correctly for this new dmaunmap case. Basically, 
we don't want to accidentally dmaunmap the same thing twice. And we need 
to make sure that amdgpu_amdkfd_gpuvm_sync_memory is harmless if it's 
called multiple times due to restarts.

There is a test in KFDTest specifically to try to recreate this scenario 
where amdgpu_amdkfd_gpuvm_sync_memory gets interrupted by a signal: 
KFDMemoryTest.SignalHandling. It probably won't be able to trigger a 
signal interrupting an amdgpu_bo_reserve, though, because that's very 
fast if there is no lock contention. So we may just have to rely on code 
review. Or you can try to recreate the scenario manually by returning 
-ERESTARTSYS under some conditions.

Regards,
   Felix


> +			pr_debug("DMA unmapping failed\n");
> +			goto dmaunmap_failed;
> +		}
>   	}
>   
>   	mutex_unlock(&p->mutex);
> @@ -1455,6 +1459,7 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
>   get_mem_obj_from_handle_failed:
>   unmap_memory_from_gpu_failed:
>   sync_memory_failed:
> +dmaunmap_failed:
>   	mutex_unlock(&p->mutex);
>   copy_from_user_failed:
>   	kfree(devices_arr);


More information about the amd-gfx mailing list