[PATCH] drm/amdgpu: Disable ACA on VFs
Skvortsov, Victor
Victor.Skvortsov at amd.com
Thu Apr 3 13:14:38 UTC 2025
[AMD Official Use Only - AMD Internal Distribution Only]
> -----Original Message-----
> From: Zhou1, Tao <Tao.Zhou1 at amd.com>
> Sent: Wednesday, April 2, 2025 11:02 PM
> To: Skvortsov, Victor <Victor.Skvortsov at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Zhao, Victor
> <Victor.Zhao at amd.com>
> Subject: RE: [PATCH] drm/amdgpu: Disable ACA on VFs
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> > -----Original Message-----
> > From: Skvortsov, Victor <Victor.Skvortsov at amd.com>
> > Sent: Thursday, April 3, 2025 6:16 AM
> > To: amd-gfx at lists.freedesktop.org
> > Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Zhao, Victor
> > <Victor.Zhao at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; Skvortsov,
> > Victor <Victor.Skvortsov at amd.com>
> > Subject: [PATCH] drm/amdgpu: Disable ACA on VFs
> >
> > VFs query RAS error counts directly from host with
> > AMDGPU_RAS_VIRT_ERROR_COUNT_QUERY. When ACA is enabled, an
> unusable
> > aca_sysfs is created rather than amdgpu_ras_sysfs_create()
> >
> > Likewise, VFs depend on host support to query CPERs, rather than ACA
> component.
> >
> > Signed-off-by: Victor Skvortsov <victor.skvortsov at amd.com>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 4 ++--
> > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 10 ++++++----
> > 2 files changed, 8 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
> > index 360e07a5c7c1..5a234eadae8b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
> > @@ -549,7 +549,7 @@ int amdgpu_cper_init(struct amdgpu_device *adev) {
> > int r;
> >
> > - if (!amdgpu_aca_is_enabled(adev))
> > + if (!amdgpu_aca_is_enabled(adev) &&
> > + !amdgpu_sriov_ras_cper_en(adev))
>
> [Tao] can we put amdgpu_sriov_ras_cper_en into amdgpu_aca_is_enabled?
[Victor] This will cause problems inside amdgpu_ras_sysfs_create() since VFs use the legacy sysfs to report IP block error counts through AMDGPU_RAS_VIRT_ERROR_COUNT_QUERY.
>
> > return 0;
> >
> > r = amdgpu_cper_ring_init(adev); @@ -568,7 +568,7 @@ int
> > amdgpu_cper_init(struct amdgpu_device *adev)
> >
> > int amdgpu_cper_fini(struct amdgpu_device *adev) {
> > - if (!amdgpu_aca_is_enabled(adev))
> > + if (!amdgpu_aca_is_enabled(adev) &&
> > + !amdgpu_sriov_ras_cper_en(adev))
> > return 0;
> >
> > adev->cper.enabled = false;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > index ebf1f63d0442..5bb7673fd28e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > @@ -3794,10 +3794,12 @@ static void amdgpu_ras_check_supported(struct
> > amdgpu_device *adev)
> > adev->ras_hw_enabled & amdgpu_ras_mask;
> >
> > /* aca is disabled by default except for psp v13_0_6/v13_0_12/v13_0_14 */
> > - adev->aca.is_enabled =
> > - (amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 6)
> > ||
> > - amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 12)
> > ||
> > - amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0,
> > 14));
> > + if (!amdgpu_sriov_vf(adev)) {
> > + adev->aca.is_enabled =
> > + (amdgpu_ip_version(adev, MP0_HWIP, 0) ==
> > IP_VERSION(13, 0, 6) ||
> > + amdgpu_ip_version(adev, MP0_HWIP, 0) ==
> > IP_VERSION(13, 0, 12) ||
> > + amdgpu_ip_version(adev, MP0_HWIP, 0) ==
> > IP_VERSION(13, 0, 14));
> > + }
> >
> > /* bad page feature is not applicable to specific app platform */
> > if (adev->gmc.is_app_apu &&
> > --
> > 2.34.1
>
More information about the amd-gfx
mailing list