[PATCH] drm/amdkfd: On GFX11 check PCIe atomics support and set CP_HQD_HQ_STATUS0[29]
Sider, Graham
Graham.Sider at amd.com
Tue Apr 4 15:59:41 UTC 2023
[Public]
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
> Russell, Kent
> Sent: Tuesday, April 4, 2023 9:43 AM
> To: Somasekharan, Sreekant <Sreekant.Somasekharan at amd.com>; amd-
> gfx at lists.freedesktop.org
> Cc: Somasekharan, Sreekant <Sreekant.Somasekharan at amd.com>
> Subject: RE: [PATCH] drm/amdkfd: On GFX11 check PCIe atomics support and
> set CP_HQD_HQ_STATUS0[29]
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> [AMD Official Use Only - General]
>
> Comments inline
>
> > -----Original Message-----
> > From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
> > Sreekant Somasekharan
> > Sent: Monday, April 3, 2023 3:59 PM
> > To: amd-gfx at lists.freedesktop.org
> > Cc: Somasekharan, Sreekant <Sreekant.Somasekharan at amd.com>
> > Subject: [PATCH] drm/amdkfd: On GFX11 check PCIe atomics support and
> > set CP_HQD_HQ_STATUS0[29]
> >
> > On GFX11, CP_HQD_HQ_STATUS0[29] bit will be used by CPFW to
> > acknowledge whether PCIe atomics are supported. The default value of
> > this bit is set to 0. Driver will check whether PCIe atomics are
> > supported and set the bit to 1 if supported. This will force CPFW to use real
> atomic ops.
> > If the bit is not set, CPFW will default to read/modify/write using
> > the firmware itself.
> >
> > This is applicable only to RS64 based GFX11 with MEC FW greater than
> > or equal to 509. If MEC FW is less than 509, PCIe atomics needs to be
> > supported, else it will skip the device.
> >
> > This commit also involves moving amdgpu_amdkfd_device_probe()
> function
> > call after per-IP early_init loop in amdgpu_device_ip_early_init()
> > function so as to check for RS64 enabled device.
> >
> > Signed-off-by: Sreekant Somasekharan
> <sreekant.somasekharan at amd.com>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
> > drivers/gpu/drm/amd/amdkfd/kfd_device.c | 11 +++++++++++
> > drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 9 +++++++++
> > 3 files changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 7116119ed038..b3a754ca0923 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2150,7 +2150,6 @@ static int amdgpu_device_ip_early_init(struct
> > amdgpu_device *adev)
> > adev->has_pr3 = parent ? pci_pr3_present(parent) : false;
> > }
> >
> > - amdgpu_amdkfd_device_probe(adev);
> >
> > adev->pm.pp_feature = amdgpu_pp_feature_mask;
> > if (amdgpu_sriov_vf(adev) || sched_policy ==
> > KFD_SCHED_POLICY_NO_HWS)
> > @@ -2206,6 +2205,7 @@ static int amdgpu_device_ip_early_init(struct
> > amdgpu_device *adev)
> > if (!total)
> > return -ENODEV;
> >
> > + amdgpu_amdkfd_device_probe(adev);
> > adev->cg_flags &= amdgpu_cg_mask;
> > adev->pg_flags &= amdgpu_pg_mask;
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > index 521dfa88aad8..64a295a35d37 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> > @@ -204,6 +204,17 @@ static void kfd_device_info_init(struct kfd_dev
> *kfd,
> > /* Navi1x+ */
> > if (gc_version >= IP_VERSION(10, 1, 1))
> > kfd->device_info.needs_pci_atomics =
> > true;
> > + } else if (gc_version < IP_VERSION(12, 0, 0)) {
>
>
> What if we get a GFX9 with MEC v509? Wouldn't that trigger this too?
> Wondering if this should be if (gc_version>=IP_VERSION(11,0,0) &&
> gc_version < IP_VERSION(12,0,0)) thus ensuring it's only GFX11. Or maybe
> there is some better check than that. Either way, checking that it's < GFX11
> might false-positive on GFX10- too, so we should probably be explicit in our
> GFX check that it's only GFX11.
The previous condition is for gc_version < IP_VERSION(11, 0, 0), so that condition will (and currently is) taken for gfx9/gfx10/etc.
That's to say the logic after this change will look like:
If (KFD_IS_SOC15(kfd)) {
<...>
If (gc_version < IP_VERSION(11, 0, 0)) {
<...>
} else if (gc_version < IP_VERSION(12, 0, 0)) {
<...>
}
}
So this new path will only be taken for gfx11.
Best,
Graham
>
> Kent
>
> > + /* On GFX11 running on RS64, MEC FW version must
> > + be
> > greater than
> > + * or equal to version 509 to support
> > + acknowledging
> > whether
> > + * PCIe atomics are supported. Before MEC
> > + version 509,
> > PCIe
> > + * atomics are required. After that, the FW's
> > + use of
> > atomics
> > + * is controlled by CP_HQD_HQ_STATUS0[29].
> > + * This will fail on GFX11 when PCIe atomics are
> > + not
> > supported
> > + * and MEC FW version < 509 for RS64 based CPFW.
> > + */
> > + kfd->device_info.needs_pci_atomics = true;
> > + kfd->device_info.no_atomic_fw_version =
> > + kfd->adev-
> > >gfx.rs64_enable ? 509 : 0;
> > }
> > } else {
> > kfd->device_info.doorbell_size = 4; diff --git
> > a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
> > index 4a9af800b1f1..c5ea594abbf6 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
> > @@ -143,6 +143,15 @@ static void init_mqd(struct mqd_manager *mm,
> void
> > **mqd,
> > 1 << CP_HQD_QUANTUM__QUANTUM_SCALE__SHIFT
> > |
> > 1 <<
> > CP_HQD_QUANTUM__QUANTUM_DURATION__SHIFT;
> >
> > + /*
> > + * If PCIe atomics are supported, set CP_HQD_HQ_STATUS0[29] == 1
> > + * to force CPFW to use atomics. This is supported only on MEC FW
> > + * version >= 509 and on RS64 based CPFW only. On previous versions,
> > + * platforms running on GFX11 must support atomics else will
> > + skip the
> > device.
> > + */
> > + if (amdgpu_amdkfd_have_atomics_support((mm->dev->adev)))
> > + m->cp_hqd_hq_status0 |= 1 << 29;
> > +
> > if (q->format == KFD_QUEUE_FORMAT_AQL) {
> > m->cp_hqd_aql_control =
> > 1 << CP_HQD_AQL_CONTROL__CONTROL0__SHIFT;
> > --
> > 2.25.1
More information about the amd-gfx
mailing list