[PATCH v2] drm/amdgpu: Inform if PCIe based P2P links are not available

Errabolu, Ramesh Ramesh.Errabolu at amd.com
Wed Nov 6 22:35:42 UTC 2024


[AMD Official Use Only - AMD Internal Distribution Only]

Good suggestion.

This message should only print when the two devices (provider and client) are:
  - Not behind the same root complex
  - The host bridge connecting them is not whitelisted

Will update patch and post it for review

Regards,
Ramesh

-----Original Message-----
From: Kuehling, Felix <Felix.Kuehling at amd.com>
Sent: Wednesday, November 6, 2024 3:50 PM
To: Errabolu, Ramesh <Ramesh.Errabolu at amd.com>; amd-gfx at lists.freedesktop.org
Subject: Re: [PATCH v2] drm/amdgpu: Inform if PCIe based P2P links are not available


On 2024-11-05 20:19, Ramesh Errabolu wrote:
> Raise an info message in kernel log if PCIe root complex determines
> that a AMD GPU device D<i> cannot have P2P communication with another
> AMD GPU device D<j>
>
> Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 230c24638a34..ab304a2c4aaf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -6222,6 +6222,8 @@ bool amdgpu_device_is_peer_accessible(struct amdgpu_device *adev,
>       bool p2p_access =
>               !adev->gmc.xgmi.connected_to_cpu &&
>               !(pci_p2pdma_distance(adev->pdev, peer_adev->dev, false) < 0);
> +     if (!p2p_access)
> +             pr_info("PCIe based P2P links are not available\n");

This message would be a lot more useful if it told you, which two devices are affected. You can use dev_info to have it print the PCIe BDF of adev. Then you only need to print additional information for peer_adev using pci_name(peer_adev->pdev).

Alternatively, you can just set the last parameter of pci_p2pdma_distance to "true". Although, that will produce more scary-looking warning messages. I believe Lijo disabled that at some point because of that reason.

Regards,
  Felix



>
>       bool is_large_bar = adev->gmc.visible_vram_size &&
>               adev->gmc.real_vram_size == adev->gmc.visible_vram_size;


More information about the amd-gfx mailing list