[PATCH] drm/amdgpu: Add kernel parameter to force no xgmi
Sierra Guiza, Alejandro (Alex)
alex.sierra at amd.com
Wed Oct 28 18:08:38 UTC 2020
On 10/28/2020 9:58 AM, Christian König wrote:
> Am 28.10.20 um 15:55 schrieb Alex Sierra:
>> By enabling this parameter, the system will be forced to use pcie
>> interface only for p2p transactions.
>
> Better name that amdgpu_xgmi with a default value of enabled.
>
> Or maybe add another bit value for amdgpu_vm_debug instead.
Ack
Regards,
Alex Sierra
>
>
>>
>> Signed-off-by: Alex Sierra <alex.sierra at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++++++++
>> 3 files changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index ba65d4f2ab67..3645f00e9f61 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -188,6 +188,7 @@ extern int amdgpu_discovery;
>> extern int amdgpu_mes;
>> extern int amdgpu_noretry;
>> extern int amdgpu_force_asic_type;
>> +extern int amdgpu_force_no_xgmi;
>> #ifdef CONFIG_HSA_AMD
>> extern int sched_policy;
>> extern bool debug_evictions;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 1fe850e0a94d..0a5d97a84017 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2257,7 +2257,7 @@ static int amdgpu_device_ip_init(struct
>> amdgpu_device *adev)
>> if (r)
>> goto init_failed;
>> - if (adev->gmc.xgmi.num_physical_nodes > 1)
>> + if (!amdgpu_force_no_xgmi && adev->gmc.xgmi.num_physical_nodes > 1)
>
> Mhm, this will most likely cause problems. You still need to add the
> device to the hive because otherwise GPU won't work.
What kind of problems? So far, I have validated this using a system with
multiple devices and running ./rocm_bandwidth_test -t. With and without
the parameter set.
Regards,
Alex Sierra
>
> Apart from that sounds like a good idea in general.
>
> Christian.
>
>> amdgpu_xgmi_add_device(adev);
>> amdgpu_amdkfd_device_init(adev);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index 4b78ecfd35f7..22485067cf31 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -160,6 +160,7 @@ int amdgpu_force_asic_type = -1;
>> int amdgpu_tmz = 0;
>> int amdgpu_reset_method = -1; /* auto */
>> int amdgpu_num_kcq = -1;
>> +int amdgpu_force_no_xgmi = 0;
>> struct amdgpu_mgpu_info mgpu_info = {
>> .mutex = __MUTEX_INITIALIZER(mgpu_info.mutex),
>> @@ -522,6 +523,14 @@ module_param_named(ras_enable,
>> amdgpu_ras_enable, int, 0444);
>> MODULE_PARM_DESC(ras_mask, "Mask of RAS features to enable (default
>> 0xffffffff), only valid when ras_enable == 1");
>> module_param_named(ras_mask, amdgpu_ras_mask, uint, 0444);
>> +/**
>> + * DOC: force_no_xgmi (uint)
>> + * Forces not to use xgmi interface (0 = disable, 1 = enable).
>> + * Default is 0 (disabled).
>> + */
>> +MODULE_PARM_DESC(force_no_xgmi, "Force not to use xgmi interface");
>> +module_param_named(force_no_xgmi, amdgpu_force_no_xgmi, int, 0600);
>> +
>> /**
>> * DOC: si_support (int)
>> * Set SI support driver. This parameter works after set config
>> CONFIG_DRM_AMDGPU_SI. For SI asic, when radeon driver is enabled,
>
More information about the amd-gfx
mailing list