[PATCH] drm/amdkfd: report atomics support in io_links over xgmi

Kim, Jonathan Jonathan.Kim at amd.com
Fri Apr 30 19:20:02 UTC 2021


[AMD Official Use Only - Internal Distribution Only]

> -----Original Message-----
> From: Kuehling, Felix <Felix.Kuehling at amd.com>
> Sent: Friday, April 30, 2021 2:55 PM
> To: Zeng, Oak <Oak.Zeng at amd.com>; Kim, Jonathan
> <Jonathan.Kim at amd.com>; amd-gfx at lists.freedesktop.org; Cornwall, Jay
> <Jay.Cornwall at amd.com>
> Cc: Errabolu, Ramesh <Ramesh.Errabolu at amd.com>
> Subject: Re: [PATCH] drm/amdkfd: report atomics support in io_links over
> xgmi
>
> Am 2021-04-30 um 2:16 p.m. schrieb Zeng, Oak:
> > Sorry I should be clearer.
> >
> > dev->gpu->pci_atomic_requested is set to the value of adev-
> >have_atomics_support - see function
> amdgpu_amdkfd_have_atomics_support. adev->have_atomics_support is
> set through function pci_enable_atomic_ops_to_root currently in
> amdgpu_device_init - I don't think this logic is correct for xgmi connected
> devices. For xgmi connected devices, adev->have_atomics_support should
> always set to true. + at Cornwall, Jay to comment as the original author of this
> codes.
> >
> > The codes Jon put below refers dev->gpu->pci_atomic_requested to set
> the io link properties. I think we should fix adev->have_atomics_support
> which is the source of dev->gpu->pci_atomic_requested. Once adev-
> >have_atomics_support is fixed, part of the code in kfd_topology.c doesn't
> need to be changed.
>
> I disagree. You're basically just changing the name of a variable. That doesn't
> change any of the logic.
>
>
> >  Currently kfd_topology.c is the only consumer of adev-
> >have_atomics_support and seems we only use such information for gpu-
> gpu io-link not for gpu-cpu iolink properties. But I still think we fix it from the
> source is better because in the future there might be code refers to adev-
> >have_atomics_support. The code I put below is wrong though, it should be:
> > If (!adev->gmc.xgmi.num_physical_nodes)
> > r = pci_enable_atomic_ops_to_root(adev->pdev,
> >
> PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
> >
> PCI_EXP_DEVCAP2_ATOMIC_COMP64);
>
> No. pci_enable_atomic_ops_to_root enables the GPU to perform atomic
> ops on system memory over PCIe. The number of nodes in an XGMI hive has
> no influence on this. The only situation where we don't need this is on GPUs
> that connect to the host via XGMI. So the condition if
> (!adev->gmc.xgmi.connected_to_cpu) is correct.


Then it seems like this was missed in the review and a follow on fix will be needed to be applied (basically remove this condition and move the denial of the pci_atomic_requested check under the kfd topology !connected_to_cpu condition that currently only denies pcie_capability_read_dword checks for cpu POV flags:

>     >     +if (!adev->gmc.xgmi.num_physical_nodes) {
>     >     +if (!dev->gpu->pci_atomic_requested ||
>     >     +dev->gpu->device_info->asic_family ==
>     >     +CHIP_HAWAII)
>     >     +flag |= CRAT_IOLINK_FLAGS_NO_ATOMICS_32_BIT |
>     >     +CRAT_IOLINK_FLAGS_NO_ATOMICS_64_BIT;
>     >     +}

Thanks,

Jon

>
> Regards,
>   Felix
>
>
> >
> > Regards,
> > Oak
> >
> >
> >
> > On 2021-04-29, 10:45 PM, "Kuehling, Felix" <Felix.Kuehling at amd.com>
> wrote:
> >
> >
> >     Am 2021-04-29 um 9:12 p.m. schrieb Zeng, Oak:
> >     > I think part of this can be done more clean in amdgpu_device_init:
> >     >
> >     > r = 0;
> >     > If (!adev->gmc.xgmi.connected_to_cpu)
> >     > /* enable PCIE atomic ops */
> >     > r = pci_enable_atomic_ops_to_root(adev->pdev,
> >     >
> PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
> >     >
> PCI_EXP_DEVCAP2_ATOMIC_COMP64);
> >     > if (r) {
> >     > adev->have_atomics_support = false;
> >     > DRM_INFO("PCIE atomic ops is not supported\n");
> >     > } else {
> >     > adev->have_atomics_support = true;
> >     > }
> >
> >     This code is already in amdgpu_device_init. I'm not sure what's your
> >     point. Are you suggesting that the topology flags should be updated in
> >     amdgpu_device_init? That's not really possible because the KFD topology
> >     data structures don't exist at that time (they do only after the call to
> >     amdgpu_device_ip_init) and the data structures are not known in
> >     amdgpu_device.c. It's cleaner to keep this compartmentalized in
> >     kfd_topology.c.
> >
> >     Regards,
> >       Felix
> >
> >
> >     > Regards,
> >     > Oak
> >     >
> >     >
> >     >
> >     > On 2021-04-29, 5:36 AM, "Kim, Jonathan" <Jonathan.Kim at amd.com>
> wrote:
> >     >
> >     >     Link atomics support over xGMI should be reported independently of
> PCIe.
> >     >
> >     >     Signed-off-by: Jonathan Kim <jonathan.kim at amd.com>
> >     >     Tested-by: Ramesh Errabolu <ramesh.errabolu at amd.com>
> >     >     ---
> >     >      drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 29
> ++++++++++++++---------
> >     >      1 file changed, 18 insertions(+), 11 deletions(-)
> >     >
> >     >     diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >     >     index 083ac9babfa8..30430aefcfc7 100644
> >     >     --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >     >     +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> >     >     @@ -1196,6 +1196,7 @@ static void
> kfd_fill_iolink_non_crat_info(struct kfd_topology_device *dev)
> >     >      {
> >     >      struct kfd_iolink_properties *link, *cpu_link;
> >     >      struct kfd_topology_device *cpu_dev;
> >     >     +struct amdgpu_device *adev;
> >     >      uint32_t cap;
> >     >      uint32_t cpu_flag = CRAT_IOLINK_FLAGS_ENABLED;
> >     >      uint32_t flag = CRAT_IOLINK_FLAGS_ENABLED;
> >     >     @@ -1203,18 +1204,24 @@ static void
> kfd_fill_iolink_non_crat_info(struct kfd_topology_device *dev)
> >     >      if (!dev || !dev->gpu)
> >     >      return;
> >     >
> >     >     -pcie_capability_read_dword(dev->gpu->pdev,
> >     >     -PCI_EXP_DEVCAP2, &cap);
> >     >     -
> >     >     -if (!(cap & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
> >     >     -     PCI_EXP_DEVCAP2_ATOMIC_COMP64)))
> >     >     -cpu_flag |=
> CRAT_IOLINK_FLAGS_NO_ATOMICS_32_BIT |
> >     >     -CRAT_IOLINK_FLAGS_NO_ATOMICS_64_BIT;
> >     >     +adev = (struct amdgpu_device *)(dev->gpu->kgd);
> >     >     +if (!adev->gmc.xgmi.connected_to_cpu) {
> >     >     +pcie_capability_read_dword(dev->gpu->pdev,
> >     >     +PCI_EXP_DEVCAP2, &cap);
> >     >     +
> >     >     +if (!(cap & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
> >     >     +     PCI_EXP_DEVCAP2_ATOMIC_COMP64)))
> >     >     +cpu_flag |=
> CRAT_IOLINK_FLAGS_NO_ATOMICS_32_BIT |
> >     >     +
> CRAT_IOLINK_FLAGS_NO_ATOMICS_64_BIT;
> >     >     +}
> >     >
> >     >     -if (!dev->gpu->pci_atomic_requested ||
> >     >     -    dev->gpu->device_info->asic_family == CHIP_HAWAII)
> >     >     -flag |= CRAT_IOLINK_FLAGS_NO_ATOMICS_32_BIT |
> >     >     -CRAT_IOLINK_FLAGS_NO_ATOMICS_64_BIT;
> >     >     +if (!adev->gmc.xgmi.num_physical_nodes) {
> >     >     +if (!dev->gpu->pci_atomic_requested ||
> >     >     +dev->gpu->device_info->asic_family
> ==
> >     >     +CHIP_HAWAII)
> >     >     +flag |=
> CRAT_IOLINK_FLAGS_NO_ATOMICS_32_BIT |
> >     >     +
> CRAT_IOLINK_FLAGS_NO_ATOMICS_64_BIT;
> >     >     +}
> >     >
> >     >      /* GPU only creates direct links so apply flags setting to all */
> >     >      list_for_each_entry(link, &dev->io_link_props, list) {
> >     >     --
> >     >     2.17.1
> >     >
> >     >
> >


More information about the amd-gfx mailing list