[PATCH v3 2/3] drm/amdkfd: force raven as "dgpu" path (v3)
Felix Kuehling
felix.kuehling at amd.com
Thu Aug 20 13:22:09 UTC 2020
Am 2020-08-20 um 5:38 a.m. schrieb Huang Rui:
> On Thu, Aug 20, 2020 at 08:31:25AM +0800, Huang Rui wrote:
>> On Thu, Aug 20, 2020 at 08:18:57AM +0800, Kuehling, Felix wrote:
>>> On 2020-08-19 7:56 p.m., Huang Rui wrote:
>>>> On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote:
>>>>> Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:
>>>>>> We still have a few iommu issues which need to address, so force raven
>>>>>> as "dgpu" path for the moment.
>>>>>>
>>>>>> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled
>>>>>> or ACPI CRAT table not correct.
>>>>>>
>>>>>> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2.
>>>>>> v3: Align with existed thunk, don't change the way of raven, only renoir
>>>>>> will use "dgpu" path by default.
>>>>>>
>>>>>> Signed-off-by: Huang Rui <ray.huang at amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++-
>>>>>> drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 28 ++++++++++++++++++++++-
>>>>>> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +-
>>>>>> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +-
>>>>>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 +
>>>>>> 5 files changed, 34 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> index a9a4319c24ae..189f9d7e190d 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar,
>>>>>> * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT
>>>>>> * table to get information about AMD APUs. This option can serve as a workaround on
>>>>>> * systems with a broken CRAT table.
>>>>>> + *
>>>>>> + * Default is auto (according to asic type, iommu_v2, and crat table, to decide
>>>>>> + * whehter use CRAT)
>>>>>> */
>>>>>> int ignore_crat;
>>>>>> module_param(ignore_crat, int, 0444);
>>>>>> MODULE_PARM_DESC(ignore_crat,
>>>>>> - "Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
>>>>>> + "Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");
>>>>>>
>>>>>> /**
>>>>>> * DOC: halt_if_hws_hang (int)
>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>>> index 59557e3e206a..f8346d4402e2 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>>>>> @@ -22,6 +22,7 @@
>>>>>>
>>>>>> #include <linux/pci.h>
>>>>>> #include <linux/acpi.h>
>>>>>> +#include <asm/processor.h>
>>>>>> #include "kfd_crat.h"
>>>>>> #include "kfd_priv.h"
>>>>>> #include "kfd_topology.h"
>>>>>> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> +
>>>>>> +#ifdef CONFIG_ACPI
>>>>>> +static void kfd_setup_ignore_crat_option(void)
>>>>>> +{
>>>>>> +
>>>>>> + if (ignore_crat)
>>>>>> + return;
>>>>>> +
>>>>>> +#ifndef KFD_SUPPORT_IOMMU_V2
>>>>>> + ignore_crat = 1;
>>>>>> +#else
>>>>>> + ignore_crat = 0;
>>>>>> +#endif
>>>>>> +
>>>>>> + /* Renoir use the fallback path to align with existed thunk */
>>>>> Are you sure you need special code for Renoir here? For Renoir the
>>>>> dev->device_info already treats it as a dGPU and always has.
>>>> Renoir also is an APU, in other words, we might have got the correct CRAT
>>>> table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If
>>>> we had got CRAT table, the kfd would create an APU node. That's not
>>>> expected.
>>> kfd_assign_gpu will not assign a Renoir GPU as the APU from the CRAT
>>> table because gpu->device_info->needs_iommu_device is False for Renoir.
>>> So Renoir will always show up in the topology as its own discrete GPU node.
>>>
>>> How does this work today? Renoir is already treated as a dGPU. But the
>>> CPU node info (/sys/class/kfd/kfd/topology/nodes/0/properties) from the
>>> CRAT table still shows GPU cores?
>>>
>> Yes, Renoir works well. In fact, I found the problem while I was enabling
>> the dGPU path for raven before. Even I set needs_iommu_device as false in
>> raven's device info. The kfd still creates the APU node. (in v1 patch)
>>
>> Let me rollback to check it again.
>>
> Hi Felix,
>
> I double check it again. If Renoir's ACPI CRAT were good, we wouldn't
> create virtual CRAT table for Renoir at this moement. Then the pure CPU
> node is unable to be created. That conflicted with needs_iommu_device flag
> (we hardcode needs_iommu_device as false for renoir). Then will break the
> user mode driver. (please see below rocminfo)
>
> rocminfo: /libhsakmt/src/topology.c:1079: topology_sysfs_get_node_props: Assertion `props->EngineId.ui32.Major' failed.
>
> So we probably would better have a specific handling to make sure Renoir
> with virtual CRAT.
OK. That would be a solution that works for Renoir only.
I'd suggest trying a more general solution in node_show in
kfd_topology.c. If we see an APU node (that has CPU and GPU cores) with
no associated GPU (dev->gpu_id is 0 or dev->gpu is NULL), we can report
it as a pure CPU node to user mode by just reporting the simd_count as 0.
Regards,
Felix
>
> Thanks,
> Ray
More information about the amd-gfx
mailing list