[PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)

Felix Kuehling felix.kuehling at amd.com
Wed Aug 19 15:38:34 UTC 2020


Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:
> We still have a few iommu issues which need to address, so force raven
> as "dgpu" path for the moment.
>
> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
> or ACPI CRAT table not correct.
>
> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
> v3: Align with existed thunk, don't change the way of raven, only renoir
>     will use "dgpu" path by default.
>
> Signed-off-by: Huang Rui <ray.huang at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  5 +++-
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 28 ++++++++++++++++++++++-
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c   |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
>  5 files changed, 34 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index a9a4319c24ae..189f9d7e190d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
>   * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT
>   * table to get information about AMD APUs. This option can serve as a workaround on
>   * systems with a broken CRAT table.
> + *
> + * Default is auto (according to asic type, iommu_v2, and crat table, to decide
> + * whehter use CRAT)
>   */
>  int ignore_crat;
>  module_param(ignore_crat, int, 0444);
>  MODULE_PARM_DESC(ignore_crat,
> -	"Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
> +	"Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");
>  
>  /**
>   * DOC: halt_if_hws_hang (int)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index 59557e3e206a..f8346d4402e2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -22,6 +22,7 @@
>  
>  #include <linux/pci.h>
>  #include <linux/acpi.h>
> +#include <asm/processor.h>
>  #include "kfd_crat.h"
>  #include "kfd_priv.h"
>  #include "kfd_topology.h"
> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>  	return 0;
>  }
>  
> +
> +#ifdef CONFIG_ACPI
> +static void kfd_setup_ignore_crat_option(void)
> +{
> +
> +	if (ignore_crat)
> +		return;
> +
> +#ifndef KFD_SUPPORT_IOMMU_V2
> +	ignore_crat = 1;
> +#else
> +	ignore_crat = 0;
> +#endif
> +
> +	/* Renoir use the fallback path to align with existed thunk */

Are you sure you need special code for Renoir here? For Renoir the
dev->device_info already treats it as a dGPU and always has.

I don't like the whole idea of changing the value of a module parameter,
because it is global and visible to the user through sysfs. Instead, if
you need to override the value of ignore_crat to consider other
conditions, I think kfd_device_use_iommu_v2 and
kfd_create_crat_image_acpi would be the right place to do it.

To avoid duplicating the conditions, you could add a helper function
bool kfd_ignore_crat(void) that can be called instead of using the
ignore_crat parameter directly. This function, changing the global
module parameter, should be removed.


> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> +	    boot_cpu_data.x86 == 0x17 &&
> +	    boot_cpu_data.x86_model >= 0x60 && boot_cpu_data.x86_model < 0x70) {
> +		ignore_crat = 1;
> +	}
> +
> +	return;
> +}
> +
>  /*
>   * kfd_create_crat_image_acpi - Allocates memory for CRAT image and
>   * copies CRAT from ACPI (if available).
> @@ -751,7 +776,6 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>   *
>   *	Return 0 if successful else return error code
>   */
> -#ifdef CONFIG_ACPI
>  int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
>  {
>  	struct acpi_table_header *crat_table;
> @@ -775,6 +799,8 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
>  		return -EINVAL;
>  	}
>  
> +	kfd_setup_ignore_crat_option();
> +
>  	if (ignore_crat) {
>  		pr_info("CRAT table disabled by module option\n");
>  		return -ENODATA;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 2c030c2b5b8d..dab44951c4d8 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -112,6 +112,7 @@ static const struct kfd_device_info carrizo_device_info = {
>  	.num_xgmi_sdma_engines = 0,
>  	.num_sdma_queues_per_engine = 2,
>  };
> +#endif
>  
>  static const struct kfd_device_info raven_device_info = {
>  	.asic_family = CHIP_RAVEN,
> @@ -130,7 +131,6 @@ static const struct kfd_device_info raven_device_info = {
>  	.num_xgmi_sdma_engines = 0,
>  	.num_sdma_queues_per_engine = 2,
>  };
> -#endif
>  
>  static const struct kfd_device_info hawaii_device_info = {
>  	.asic_family = CHIP_HAWAII,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 82f955750e75..4b6e7ef7a71c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -1234,7 +1234,7 @@ static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd)
>  
>  static inline bool kfd_device_use_iommu_v2(const struct kfd_dev *dev)
>  {
> -	return dev && dev->device_info->needs_iommu_device;
> +	return !ignore_crat && dev && dev->device_info->needs_iommu_device;
>  }
>  
>  /* Debugfs */
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 4b29815e9205..b92ce75a4c53 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -1090,6 +1090,7 @@ int kfd_topology_init(void)
>  						    COMPUTE_UNIT_CPU, NULL,
>  						    proximity_domain);
>  		cpu_only_node = 1;
> +		ignore_crat = 1;

Don't change the global variable. I think you're doing this here in case
the CRAT table is broken and contains no GPU info. Maybe we need to add
a new flag "use_iommu_v2" into the kfd_dev structure to handle this.

Regards,
  Felix


>  		if (ret) {
>  			pr_err("Error creating VCRAT table for CPU\n");
>  			return ret;


More information about the amd-gfx mailing list