[PATCH] drm/amdkfd: enable heavy-weight TLB flush on Arcturus

Felix Kuehling felix.kuehling at amd.com
Tue Jan 18 21:50:32 UTC 2022


Am 2022-01-18 um 4:28 p.m. schrieb Eric Huang:
> SDMA FW fixes the hang issue for adding heavy-weight TLB
> flush on Arcturus, so we can enable it.
>
> Signed-off-by: Eric Huang <jinhuieric.huang at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 ++++++---
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         | 4 +++-
>  2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index a64cbbd943ba..f1fed0fc31d3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1892,10 +1892,13 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>  				true);
>  	ret = unreserve_bo_and_vms(&ctx, false, false);
>  
> -	/* Only apply no TLB flush on Aldebaran to
> -	 * workaround regressions on other Asics.
> +	/* Only apply no TLB flush on Aldebaran and Arcturus
> +	 * to workaround regressions on other Asics.
>  	 */
> -	if (table_freed && (adev->asic_type != CHIP_ALDEBARAN))
> +	if (table_freed &&
> +	    (adev->asic_type != CHIP_ALDEBARAN) &&
> +	    (adev->asic_type != CHIP_ARCTURUS ||
> +	     adev->sdma.instance[0].fw_version < 18))
>  		*table_freed = true;

Can we move this check into the caller in kfd_chardev.c? That avoids
spreading around these conditions in several places.


>  
>  	goto out;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index b570c0454ce9..0e4a76dca809 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1806,7 +1806,9 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
>  	}
>  	mutex_unlock(&p->mutex);
>  
> -	if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) {
> +	if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
> +	    (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
> +	     dev->adev->sdma.instance[0].fw_version >= 18)) {

Maybe put this into a helper function that can be used here and in
kfd_ioctl_map_memory_to_gpu. I also saw this being duplicated in the
upcoming CRIU patches. And we may want to adopt this in the SVM code as
well. Having one common helper makes sure we'll keep the TLB flushing
strategy consistent everywhere. Something like:

    bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) {
    	return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) ||
    	      (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) &&
    	       dev->adev->sdma.instance[0].fw_version >= 18);
    }

Regards,
  Felix


>  		err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev,
>  				(struct kgd_mem *) mem, true);
>  		if (err) {


More information about the amd-gfx mailing list