[PATCH] drm/amdkfd: report atomics support in io_links over xgmi

Felix Kuehling felix.kuehling at amd.com
Fri Apr 30 18:55:28 UTC 2021


Am 2021-04-30 um 2:16 p.m. schrieb Zeng, Oak:
> Sorry I should be clearer. 
>
> dev->gpu->pci_atomic_requested is set to the value of adev->have_atomics_support - see function amdgpu_amdkfd_have_atomics_support. adev->have_atomics_support is set through function pci_enable_atomic_ops_to_root currently in amdgpu_device_init - I don't think this logic is correct for xgmi connected devices. For xgmi connected devices, adev->have_atomics_support should always set to true. + at Cornwall, Jay to comment as the original author of this codes.
>
> The codes Jon put below refers dev->gpu->pci_atomic_requested to set the io link properties. I think we should fix adev->have_atomics_support which is the source of dev->gpu->pci_atomic_requested. Once adev->have_atomics_support is fixed, part of the code in kfd_topology.c doesn't need to be changed.

I disagree. You're basically just changing the name of a variable. That
doesn't change any of the logic.


>  Currently kfd_topology.c is the only consumer of adev->have_atomics_support and seems we only use such information for gpu-gpu io-link not for gpu-cpu iolink properties. But I still think we fix it from the source is better because in the future there might be code refers to adev->have_atomics_support. The code I put below is wrong though, it should be: 
> If (!adev->gmc.xgmi.num_physical_nodes)
> 	r = pci_enable_atomic_ops_to_root(adev->pdev,
> 					  PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
> 					  PCI_EXP_DEVCAP2_ATOMIC_COMP64);

No. pci_enable_atomic_ops_to_root enables the GPU to perform atomic ops
on system memory over PCIe. The number of nodes in an XGMI hive has no
influence on this. The only situation where we don't need this is on
GPUs that connect to the host via XGMI. So the condition if
(!adev->gmc.xgmi.connected_to_cpu) is correct.

Regards,
  Felix


>
> Regards,
> Oak 
>
>  
>
> On 2021-04-29, 10:45 PM, "Kuehling, Felix" <Felix.Kuehling at amd.com> wrote:
>
>
>     Am 2021-04-29 um 9:12 p.m. schrieb Zeng, Oak:
>     > I think part of this can be done more clean in amdgpu_device_init:
>     >
>     > 	r = 0;
>     > 	If (!adev->gmc.xgmi.connected_to_cpu)
>     > 		/* enable PCIE atomic ops */
>     > 		r = pci_enable_atomic_ops_to_root(adev->pdev,
>     > 					  PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
>     > 					  PCI_EXP_DEVCAP2_ATOMIC_COMP64);
>     > 	if (r) {
>     > 		adev->have_atomics_support = false;
>     > 		DRM_INFO("PCIE atomic ops is not supported\n");
>     > 	} else {
>     > 		adev->have_atomics_support = true;
>     > 	}
>
>     This code is already in amdgpu_device_init. I'm not sure what's your
>     point. Are you suggesting that the topology flags should be updated in
>     amdgpu_device_init? That's not really possible because the KFD topology
>     data structures don't exist at that time (they do only after the call to
>     amdgpu_device_ip_init) and the data structures are not known in
>     amdgpu_device.c. It's cleaner to keep this compartmentalized in
>     kfd_topology.c.
>
>     Regards,
>       Felix
>
>
>     > Regards,
>     > Oak 
>     >
>     >  
>     >
>     > On 2021-04-29, 5:36 AM, "Kim, Jonathan" <Jonathan.Kim at amd.com> wrote:
>     >
>     >     Link atomics support over xGMI should be reported independently of PCIe.
>     >
>     >     Signed-off-by: Jonathan Kim <jonathan.kim at amd.com>
>     >     Tested-by: Ramesh Errabolu <ramesh.errabolu at amd.com>
>     >     ---
>     >      drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 29 ++++++++++++++---------
>     >      1 file changed, 18 insertions(+), 11 deletions(-)
>     >
>     >     diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>     >     index 083ac9babfa8..30430aefcfc7 100644
>     >     --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>     >     +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>     >     @@ -1196,6 +1196,7 @@ static void kfd_fill_iolink_non_crat_info(struct kfd_topology_device *dev)
>     >      {
>     >      	struct kfd_iolink_properties *link, *cpu_link;
>     >      	struct kfd_topology_device *cpu_dev;
>     >     +	struct amdgpu_device *adev;
>     >      	uint32_t cap;
>     >      	uint32_t cpu_flag = CRAT_IOLINK_FLAGS_ENABLED;
>     >      	uint32_t flag = CRAT_IOLINK_FLAGS_ENABLED;
>     >     @@ -1203,18 +1204,24 @@ static void kfd_fill_iolink_non_crat_info(struct kfd_topology_device *dev)
>     >      	if (!dev || !dev->gpu)
>     >      		return;
>     >
>     >     -	pcie_capability_read_dword(dev->gpu->pdev,
>     >     -			PCI_EXP_DEVCAP2, &cap);
>     >     -
>     >     -	if (!(cap & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
>     >     -		     PCI_EXP_DEVCAP2_ATOMIC_COMP64)))
>     >     -		cpu_flag |= CRAT_IOLINK_FLAGS_NO_ATOMICS_32_BIT |
>     >     -			CRAT_IOLINK_FLAGS_NO_ATOMICS_64_BIT;
>     >     +	adev = (struct amdgpu_device *)(dev->gpu->kgd);
>     >     +	if (!adev->gmc.xgmi.connected_to_cpu) {
>     >     +		pcie_capability_read_dword(dev->gpu->pdev,
>     >     +				PCI_EXP_DEVCAP2, &cap);
>     >     +
>     >     +		if (!(cap & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
>     >     +			     PCI_EXP_DEVCAP2_ATOMIC_COMP64)))
>     >     +			cpu_flag |= CRAT_IOLINK_FLAGS_NO_ATOMICS_32_BIT |
>     >     +				CRAT_IOLINK_FLAGS_NO_ATOMICS_64_BIT;
>     >     +	}
>     >
>     >     -	if (!dev->gpu->pci_atomic_requested ||
>     >     -	    dev->gpu->device_info->asic_family == CHIP_HAWAII)
>     >     -		flag |= CRAT_IOLINK_FLAGS_NO_ATOMICS_32_BIT |
>     >     -			CRAT_IOLINK_FLAGS_NO_ATOMICS_64_BIT;
>     >     +	if (!adev->gmc.xgmi.num_physical_nodes) {
>     >     +		if (!dev->gpu->pci_atomic_requested ||
>     >     +				dev->gpu->device_info->asic_family ==
>     >     +							CHIP_HAWAII)
>     >     +			flag |= CRAT_IOLINK_FLAGS_NO_ATOMICS_32_BIT |
>     >     +				CRAT_IOLINK_FLAGS_NO_ATOMICS_64_BIT;
>     >     +	}
>     >
>     >      	/* GPU only creates direct links so apply flags setting to all */
>     >      	list_for_each_entry(link, &dev->io_link_props, list) {
>     >     -- 
>     >     2.17.1
>     >
>     >
>


More information about the amd-gfx mailing list