[PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)

Felix Kuehling felix.kuehling at amd.com
Thu Aug 20 13:22:09 UTC 2020


Am 2020-08-20 um 5:38 a.m. schrieb Huang Rui:
> On Thu, Aug 20, 2020 at 08:31:25AM +0800, Huang Rui wrote:
>> On Thu, Aug 20, 2020 at 08:18:57AM +0800, Kuehling, Felix wrote:
>>> On 2020-08-19 7:56 p.m., Huang Rui wrote:
>>>> On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote:
>>>>> Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:
>>>>>> We still have a few iommu issues which need to address, so force raven
>>>>>> as "dgpu" path for the moment.
>>>>>>
>>>>>> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
>>>>>> or ACPI CRAT table not correct.
>>>>>>
>>>>>> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
>>>>>> v3: Align with existed thunk, don't change the way of raven, only renoir
>>>>>>      will use "dgpu" path by default.
>>>>>>
>>>>>> Signed-off-by: Huang Rui <ray.huang at amd.com>
>>>>>> ---
>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  5 +++-
>>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 28 ++++++++++++++++++++++-
>>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   |  2 +-
>>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  2 +-
>>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  1 +
>>>>>>   5 files changed, 34 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> index a9a4319c24ae..189f9d7e190d 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
>>>>>>    * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT
>>>>>>    * table to get information about AMD APUs. This option can serve as a workaround on
>>>>>>    * systems with a broken CRAT table.
>>>>>> + *
>>>>>> + * Default is auto (according to asic type, iommu_v2, and crat table, to decide
>>>>>> + * whehter use CRAT)
>>>>>>    */
>>>>>>   int ignore_crat;
>>>>>>   module_param(ignore_crat, int, 0444);
>>>>>>   MODULE_PARM_DESC(ignore_crat,
>>>>>> -	"Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
>>>>>> +	"Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");
>>>>>>   
>>>>>>   /**
>>>>>>    * DOC: halt_if_hws_hang (int)
>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>>> index 59557e3e206a..f8346d4402e2 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>>> @@ -22,6 +22,7 @@
>>>>>>   
>>>>>>   #include <linux/pci.h>
>>>>>>   #include <linux/acpi.h>
>>>>>> +#include <asm/processor.h>
>>>>>>   #include "kfd_crat.h"
>>>>>>   #include "kfd_priv.h"
>>>>>>   #include "kfd_topology.h"
>>>>>> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>>>>>>   	return 0;
>>>>>>   }
>>>>>>   
>>>>>> +
>>>>>> +#ifdef CONFIG_ACPI
>>>>>> +static void kfd_setup_ignore_crat_option(void)
>>>>>> +{
>>>>>> +
>>>>>> +	if (ignore_crat)
>>>>>> +		return;
>>>>>> +
>>>>>> +#ifndef KFD_SUPPORT_IOMMU_V2
>>>>>> +	ignore_crat = 1;
>>>>>> +#else
>>>>>> +	ignore_crat = 0;
>>>>>> +#endif
>>>>>> +
>>>>>> +	/* Renoir use the fallback path to align with existed thunk */
>>>>> Are you sure you need special code for Renoir here? For Renoir the
>>>>> dev->device_info already treats it as a dGPU and always has.
>>>> Renoir also is an APU, in other words, we might have got the correct CRAT
>>>> table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If
>>>> we had got CRAT table, the kfd would create an APU node. That's not
>>>> expected.
>>> kfd_assign_gpu will not assign a Renoir GPU as the APU from the CRAT 
>>> table because gpu->device_info->needs_iommu_device is False for Renoir. 
>>> So Renoir will always show up in the topology as its own discrete GPU node.
>>>
>>> How does this work today? Renoir is already treated as a dGPU. But the 
>>> CPU node info (/sys/class/kfd/kfd/topology/nodes/0/properties) from the 
>>> CRAT table still shows GPU cores?
>>>
>> Yes, Renoir works well. In fact, I found the problem while I was enabling
>> the dGPU path for raven before. Even I set needs_iommu_device as false in
>> raven's device info. The kfd still creates the APU node. (in v1 patch)
>>
>> Let me rollback to check it again.
>>
> Hi Felix,
>
> I double check it again. If Renoir's ACPI CRAT were good, we wouldn't
> create virtual CRAT table for Renoir at this moement. Then the pure CPU
> node is unable to be created. That conflicted with needs_iommu_device flag
> (we hardcode needs_iommu_device as false for renoir). Then will break the
> user mode driver. (please see below rocminfo)
>
> rocminfo: /libhsakmt/src/topology.c:1079: topology_sysfs_get_node_props: Assertion `props->EngineId.ui32.Major' failed.
>
> So we probably would better have a specific handling to make sure Renoir
> with virtual CRAT.

OK. That would be a solution that works for Renoir only.

I'd suggest trying a more general solution in node_show in
kfd_topology.c. If we see an APU node (that has CPU and GPU cores) with
no associated GPU (dev->gpu_id is 0 or dev->gpu is NULL), we can report
it as a pure CPU node to user mode by just reporting the simd_count as 0.

Regards,
  Felix


>
> Thanks,
> Ray


More information about the amd-gfx mailing list