[PATCH Review 1/1] drm/amdgpu: Workaround to skip kiq ring test during ras gpu recovery
Christian König
ckoenig.leichtzumerken at gmail.com
Fri Oct 20 12:24:27 UTC 2023
Am 17.10.23 um 16:36 schrieb Stanley.Yang:
> This is workaround, kiq ring test failed in suspend stage when do ras
> recovery for gfx v9_4_3.
Any idea why that failed? Problems like this usually point to an
incorrect init or in this case re-init procedure and are actually what
the ring test should uncover.
Christian.
>
> Change-Id: I8de9900aa76706f59bc029d4e9e8438c6e1db8e0
> Signed-off-by: Stanley.Yang <Stanley.Yang at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 9a158018ae16..902e60203809 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -29,6 +29,7 @@
> #include "amdgpu_rlc.h"
> #include "amdgpu_ras.h"
> #include "amdgpu_xcp.h"
> +#include "amdgpu_xgmi.h"
>
> /* delay 0.1 second to enable gfx off feature */
> #define GFX_OFF_DELAY_ENABLE msecs_to_jiffies(100)
> @@ -501,6 +502,9 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev, int xcc_id)
> {
> struct amdgpu_kiq *kiq = &adev->gfx.kiq[xcc_id];
> struct amdgpu_ring *kiq_ring = &kiq->ring;
> + struct amdgpu_hive_info *hive;
> + struct amdgpu_ras *ras;
> + int hive_ras_recovery;
> int i, r = 0;
> int j;
>
> @@ -521,6 +525,23 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev, int xcc_id)
> RESET_QUEUES, 0, 0);
> }
>
> + /**
> + * This is workaround: only skip kiq_ring test
> + * during ras recovery in suspend stage for gfx v9_4_3
> + */
> + hive = amdgpu_get_xgmi_hive(adev);
> + if (hive) {
> + hive_ras_recovery = atomic_read(&hive->ras_recovery);
> + amdgpu_put_xgmi_hive(hive);
> + }
> +
> + ras = amdgpu_ras_get_context(adev);
> + if ((amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3)) &&
> + ras && (atomic_read(&ras->in_recovery) || hive_ras_recovery)) {
> + spin_unlock(&kiq->ring_lock);
> + return 0;
> + }
> +
> if (kiq_ring->sched.ready && !adev->job_hang)
> r = amdgpu_ring_test_helper(kiq_ring);
> spin_unlock(&kiq->ring_lock);
More information about the amd-gfx
mailing list