[PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)

Felix Kuehling felix.kuehling at amd.com
Thu Aug 20 04:05:56 UTC 2020


Am 2020-08-19 um 11:09 p.m. schrieb Huang Rui:
> On Thu, Aug 20, 2020 at 08:18:57AM +0800, Kuehling, Felix wrote:
>> On 2020-08-19 7:56 p.m., Huang Rui wrote:
>>> On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote:
>>>> Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:
>>>>> We still have a few iommu issues which need to address, so force raven
>>>>> as "dgpu" path for the moment.
>>>>>
>>>>> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
>>>>> or ACPI CRAT table not correct.
>>>>>
>>>>> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
>>>>> v3: Align with existed thunk, don't change the way of raven, only renoir
>>>>>      will use "dgpu" path by default.
>>>>>
>>>>> Signed-off-by: Huang Rui <ray.huang at amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  5 +++-
>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 28 ++++++++++++++++++++++-
>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   |  2 +-
>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  2 +-
>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
>>>>>   5 files changed, 34 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> index a9a4319c24ae..189f9d7e190d 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
>>>>>    * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT
>>>>>    * table to get information about AMD APUs. This option can serve as a workaround on
>>>>>    * systems with a broken CRAT table.
>>>>> + *
>>>>> + * Default is auto (according to asic type, iommu_v2, and crat table, to decide
>>>>> + * whehter use CRAT)
>>>>>    */
>>>>>   int ignore_crat;
>>>>>   module_param(ignore_crat, int, 0444);
>>>>>   MODULE_PARM_DESC(ignore_crat,
>>>>> -	"Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
>>>>> +	"Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");
>>>>>   
>>>>>   /**
>>>>>    * DOC: halt_if_hws_hang (int)
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>> index 59557e3e206a..f8346d4402e2 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>> @@ -22,6 +22,7 @@
>>>>>   
>>>>>   #include <linux/pci.h>
>>>>>   #include <linux/acpi.h>
>>>>> +#include <asm/processor.h>
>>>>>   #include "kfd_crat.h"
>>>>>   #include "kfd_priv.h"
>>>>>   #include "kfd_topology.h"
>>>>> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>>>>>   	return 0;
>>>>>   }
>>>>>   
>>>>> +
>>>>> +#ifdef CONFIG_ACPI
>>>>> +static void kfd_setup_ignore_crat_option(void)
>>>>> +{
>>>>> +
>>>>> +	if (ignore_crat)
>>>>> +		return;
>>>>> +
>>>>> +#ifndef KFD_SUPPORT_IOMMU_V2
>>>>> +	ignore_crat = 1;
>>>>> +#else
>>>>> +	ignore_crat = 0;
>>>>> +#endif
>>>>> +
>>>>> +	/* Renoir use the fallback path to align with existed thunk */
>>>> Are you sure you need special code for Renoir here? For Renoir the
>>>> dev->device_info already treats it as a dGPU and always has.
>>> Renoir also is an APU, in other words, we might have got the correct CRAT
>>> table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If
>>> we had got CRAT table, the kfd would create an APU node. That's not
>>> expected.
>> kfd_assign_gpu will not assign a Renoir GPU as the APU from the CRAT 
>> table because gpu->device_info->needs_iommu_device is False for Renoir. 
>> So Renoir will always show up in the topology as its own discrete GPU node.
>>
>> How does this work today? Renoir is already treated as a dGPU. But the 
>> CPU node info (/sys/class/kfd/kfd/topology/nodes/0/properties) from the 
>> CRAT table still shows GPU cores?
>>
>> Regards,
>>    Felix
>>
>>
>>>> I don't like the whole idea of changing the value of a module parameter,
>>>> because it is global and visible to the user through sysfs. Instead, if
>>>> you need to override the value of ignore_crat to consider other
>>>> conditions, I think kfd_device_use_iommu_v2 and
>>>> kfd_create_crat_image_acpi would be the right place to do it.
>>>>
>>>> To avoid duplicating the conditions, you could add a helper function
>>>> bool kfd_ignore_crat(void) that can be called instead of using the
>>>> ignore_crat parameter directly. This function, changing the global
>>>> module parameter, should be removed.
>>> That makes sense. Will update it in next version.
>>>
>>>>> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
>>>>> +	    boot_cpu_data.x86 == 0x17 &&
>>>>> +	    boot_cpu_data.x86_model >= 0x60 && boot_cpu_data.x86_model < 0x70) {
>>>>> +		ignore_crat = 1;
>>>>> +	}
>>>>> +
>>>>> +	return;
>>>>> +}
>>>>> +
>>>>>   /*
>>>>>    * kfd_create_crat_image_acpi - Allocates memory for CRAT image and
>>>>>    * copies CRAT from ACPI (if available).
>>>>> @@ -751,7 +776,6 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>>>>>    *
>>>>>    *	Return 0 if successful else return error code
>>>>>    */
>>>>> -#ifdef CONFIG_ACPI
>>>>>   int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
>>>>>   {
>>>>>   	struct acpi_table_header *crat_table;
>>>>> @@ -775,6 +799,8 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
>>>>>   		return -EINVAL;
>>>>>   	}
>>>>>   
>>>>> +	kfd_setup_ignore_crat_option();
>>>>> +
>>>>>   	if (ignore_crat) {
>>>>>   		pr_info("CRAT table disabled by module option\n");
>>>>>   		return -ENODATA;
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>>>> index 2c030c2b5b8d..dab44951c4d8 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>>>> @@ -112,6 +112,7 @@ static const struct kfd_device_info carrizo_device_info = {
>>>>>   	.num_xgmi_sdma_engines = 0,
>>>>>   	.num_sdma_queues_per_engine = 2,
>>>>>   };
>>>>> +#endif
>>>>>   
>>>>>   static const struct kfd_device_info raven_device_info = {
>>>>>   	.asic_family = CHIP_RAVEN,
>>>>> @@ -130,7 +131,6 @@ static const struct kfd_device_info raven_device_info = {
>>>>>   	.num_xgmi_sdma_engines = 0,
>>>>>   	.num_sdma_queues_per_engine = 2,
>>>>>   };
>>>>> -#endif
>>>>>   
>>>>>   static const struct kfd_device_info hawaii_device_info = {
>>>>>   	.asic_family = CHIP_HAWAII,
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>>> index 82f955750e75..4b6e7ef7a71c 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>>> @@ -1234,7 +1234,7 @@ static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd)
>>>>>   
>>>>>   static inline bool kfd_device_use_iommu_v2(const struct kfd_dev *dev)
>>>>>   {
>>>>> -	return dev && dev->device_info->needs_iommu_device;
>>>>> +	return !ignore_crat && dev && dev->device_info->needs_iommu_device;
>>>>>   }
>>>>>   
>>>>>   /* Debugfs */
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>> index 4b29815e9205..b92ce75a4c53 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>> @@ -1090,6 +1090,7 @@ int kfd_topology_init(void)
>>>>>   						    COMPUTE_UNIT_CPU, NULL,
>>>>>   						    proximity_domain);
>>>>>   		cpu_only_node = 1;
>>>>> +		ignore_crat = 1;
>>>> Don't change the global variable. I think you're doing this here in case
>>>> the CRAT table is broken and contains no GPU info. Maybe we need to add
>>>> a new flag "use_iommu_v2" into the kfd_dev structure to handle this.
>>>>
> Find it just now, kfd_dev is not initialized here. So we may be unable to
> use flag in kfd_dev.

I see. This is very early during module init. When you get here, you
already failed to read the ACPI CRAT table and created a VCRAT for the
CPU with no GPU cores.

If you wanted to add a per device "use_iommu_v2" flag, you could
probably set that in kfd_assign_gpu when it assigns a KFD device to a
node with CPU cores.

Regards,
  Felix


>
> Thanks,
> Ray


More information about the amd-gfx mailing list