[PATCH 1/2] drm/amdgpu: stop cp resume when compute ring test failed
Zhang, Hawking
Hawking.Zhang at amd.com
Thu Apr 23 13:43:09 UTC 2020
[AMD Official Use Only - Internal Distribution Only]
Would you mind to enable this and try it again? The recent gpu reset testing on vega20 looks very positive.
Regards,
Hawking
-----Original Message-----
From: Christian König <ckoenig.leichtzumerken at gmail.com>
Sent: Thursday, April 23, 2020 20:31
To: Zhang, Hawking <Hawking.Zhang at amd.com>; amd-gfx at lists.freedesktop.org
Subject: Re: [PATCH 1/2] drm/amdgpu: stop cp resume when compute ring test failed
Am 23.04.20 um 11:01 schrieb Hawking Zhang:
> driver should stop cp resume once compute ring test failed
Mhm intentionally ignored those errors because the compute rings sometimes doesn't come up again after a GPU reset.
We even have the necessary logic in the SW scheduler to redirect the jobs to another compute ring when one fails to come up again.
Christian.
>
> Change-Id: I4cd3328f38e0755d0c877484936132d204c9fe50
> Signed-off-by: Hawking Zhang <Hawking.Zhang at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 4 +++-
> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 4 +++-
> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 +++-
> 3 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> index b2f10e3..fcee758 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> @@ -3132,7 +3132,9 @@ static int gfx_v7_0_cp_compute_resume(struct
> amdgpu_device *adev)
>
> for (i = 0; i < adev->gfx.num_compute_rings; i++) {
> ring = &adev->gfx.compute_ring[i];
> - amdgpu_ring_test_helper(ring);
> + r = amdgpu_ring_test_helper(ring);
> + if (r)
> + return r;
> }
>
> return 0;
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index 6c56ced..8dc8e90 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -4781,7 +4781,9 @@ static int gfx_v8_0_cp_test_all_rings(struct
> amdgpu_device *adev)
>
> for (i = 0; i < adev->gfx.num_compute_rings; i++) {
> ring = &adev->gfx.compute_ring[i];
> - amdgpu_ring_test_helper(ring);
> + r = amdgpu_ring_test_helper(ring);
> + if (r)
> + return r;
> }
>
> return 0;
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 09aa5f5..20937059 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3846,7 +3846,9 @@ static int gfx_v9_0_cp_resume(struct
> amdgpu_device *adev)
>
> for (i = 0; i < adev->gfx.num_compute_rings; i++) {
> ring = &adev->gfx.compute_ring[i];
> - amdgpu_ring_test_helper(ring);
> + r = amdgpu_ring_test_helper(ring);
> + if (r)
> + return r;
> }
>
> gfx_v9_0_enable_gui_idle_interrupt(adev, true);
More information about the amd-gfx
mailing list