[PATCH] drm/amdgpu: Add kernel parameter to force no xgmi

Luben Tuikov luben.tuikov at amd.com
Thu Oct 29 17:14:50 UTC 2020


On 2020-10-28 15:09, Sierra Guiza, Alejandro (Alex) wrote:
> [AMD Public Use]
> 
> Please ignore this patch, it should be in a different branch. As PCIe p2p is not supported in upstream.

No problem, but if you do add it elsewhere, please use something more specific, like

	amdgpu_xgmi_p2p

as the (positive-controlled) flag, since more generic flags could be added later,
to control a more encompassing behaviour.

Regards,
Luben

> 
> Regards,
> Alex Sierra
> 
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>> Sierra Guiza, Alejandro (Alex)
>> Sent: Wednesday, October 28, 2020 1:09 PM
>> To: Koenig, Christian <Christian.Koenig at amd.com>; amd-
>> gfx at lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: Add kernel parameter to force no xgmi
>>
>>
>> On 10/28/2020 9:58 AM, Christian König wrote:
>>> Am 28.10.20 um 15:55 schrieb Alex Sierra:
>>>> By enabling this parameter, the system will be forced to use pcie
>>>> interface only for p2p transactions.
>>>
>>> Better name that amdgpu_xgmi with a default value of enabled.
>>>
>>> Or maybe add another bit value for amdgpu_vm_debug instead.
>>
>> Ack
>>
>> Regards,
>> Alex Sierra
>>
>>>
>>>
>>>>
>>>> Signed-off-by: Alex Sierra <alex.sierra at amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        | 1 +
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 9 +++++++++
>>>>   3 files changed, 11 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> index ba65d4f2ab67..3645f00e9f61 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> @@ -188,6 +188,7 @@ extern int amdgpu_discovery;
>>>>   extern int amdgpu_mes;
>>>>   extern int amdgpu_noretry;
>>>>   extern int amdgpu_force_asic_type;
>>>> +extern int amdgpu_force_no_xgmi;
>>>>   #ifdef CONFIG_HSA_AMD
>>>>   extern int sched_policy;
>>>>   extern bool debug_evictions;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index 1fe850e0a94d..0a5d97a84017 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -2257,7 +2257,7 @@ static int amdgpu_device_ip_init(struct
>>>> amdgpu_device *adev)
>>>>       if (r)
>>>>           goto init_failed;
>>>>   -    if (adev->gmc.xgmi.num_physical_nodes > 1)
>>>> +    if (!amdgpu_force_no_xgmi && adev-
>>> gmc.xgmi.num_physical_nodes >
>>>> +1)
>>>
>>> Mhm, this will most likely cause problems. You still need to add the
>>> device to the hive because otherwise GPU won't work.
>>
>> What kind of problems? So far, I have validated this using a system with
>> multiple devices and running ./rocm_bandwidth_test -t. With and without
>> the parameter set.
>>
>> Regards,
>> Alex Sierra
>>
>>>
>>> Apart from that sounds like a good idea in general.
>>>
>>> Christian.
>>>
>>>>           amdgpu_xgmi_add_device(adev);
>>>>       amdgpu_amdkfd_device_init(adev);
>>>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> index 4b78ecfd35f7..22485067cf31 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> @@ -160,6 +160,7 @@ int amdgpu_force_asic_type = -1;
>>>>   int amdgpu_tmz = 0;
>>>>   int amdgpu_reset_method = -1; /* auto */
>>>>   int amdgpu_num_kcq = -1;
>>>> +int amdgpu_force_no_xgmi = 0;
>>>>     struct amdgpu_mgpu_info mgpu_info = {
>>>>       .mutex = __MUTEX_INITIALIZER(mgpu_info.mutex),
>>>> @@ -522,6 +523,14 @@ module_param_named(ras_enable,
>>>> amdgpu_ras_enable, int, 0444);
>>>>   MODULE_PARM_DESC(ras_mask, "Mask of RAS features to enable
>> (default
>>>> 0xffffffff), only valid when ras_enable == 1");
>>>>   module_param_named(ras_mask, amdgpu_ras_mask, uint, 0444);
>>>>   +/**
>>>> + * DOC: force_no_xgmi (uint)
>>>> + * Forces not to use xgmi interface (0 = disable, 1 = enable).
>>>> + * Default is 0 (disabled).
>>>> + */
>>>> +MODULE_PARM_DESC(force_no_xgmi, "Force not to use xgmi
>> interface");
>>>> +module_param_named(force_no_xgmi, amdgpu_force_no_xgmi, int,
>> 0600);
>>>> +
>>>>   /**
>>>>    * DOC: si_support (int)
>>>>    * Set SI support driver. This parameter works after set config
>>>> CONFIG_DRM_AMDGPU_SI. For SI asic, when radeon driver is enabled,
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.
>> freedesktop.org%2Fmailman%2Flistinfo%2Famd-
>> gfx&data=04%7C01%7Calex.sierra%40amd.com%7C6a2e34427fb449865
>> 91208d87b6c8c05%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63
>> 7395053457347633%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
>> AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=
>> s2hXVAVFtHJsNNBwhzzKDjMlEjES9uNGbYi6GdeD5cc%3D&reserved=0
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cluben.tuikov%40amd.com%7Cedd479f495ff42c3059408d87b75070b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637395089882966375%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=K3AKD5tbr82wVMZDKlCiVO1P3MkV%2FyryqxF3KyOl1uU%3D&reserved=0
> 



More information about the amd-gfx mailing list