[PATCH 1/2] drm/amdgpu: stop cp resume when compute ring test failed
Alex Deucher
alexdeucher at gmail.com
Thu Apr 23 15:06:58 UTC 2020
On Thu, Apr 23, 2020 at 10:55 AM Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
>
> Yeah, we certainly could try this again. But maybe split that up into
> individual patches for gfx7/8/9.
>
> In other words make it easy to revert if this still doesn't work well on
> gfx7 or some other generation.
Yeah, unless there is a good reason, I don't think we should do this.
IIRC, compute rings randomly fail to recover on a lot of hw
generations.
Alex
>
> Christian.
>
> Am 23.04.20 um 15:43 schrieb Zhang, Hawking:
> > [AMD Official Use Only - Internal Distribution Only]
> >
> > Would you mind to enable this and try it again? The recent gpu reset testing on vega20 looks very positive.
> >
> > Regards,
> > Hawking
> > -----Original Message-----
> > From: Christian König <ckoenig.leichtzumerken at gmail.com>
> > Sent: Thursday, April 23, 2020 20:31
> > To: Zhang, Hawking <Hawking.Zhang at amd.com>; amd-gfx at lists.freedesktop.org
> > Subject: Re: [PATCH 1/2] drm/amdgpu: stop cp resume when compute ring test failed
> >
> > Am 23.04.20 um 11:01 schrieb Hawking Zhang:
> >> driver should stop cp resume once compute ring test failed
> > Mhm intentionally ignored those errors because the compute rings sometimes doesn't come up again after a GPU reset.
> >
> > We even have the necessary logic in the SW scheduler to redirect the jobs to another compute ring when one fails to come up again.
> >
> > Christian.
> >
> >> Change-Id: I4cd3328f38e0755d0c877484936132d204c9fe50
> >> Signed-off-by: Hawking Zhang <Hawking.Zhang at amd.com>
> >> ---
> >> drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 4 +++-
> >> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 4 +++-
> >> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 +++-
> >> 3 files changed, 9 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> >> b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> >> index b2f10e3..fcee758 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> >> @@ -3132,7 +3132,9 @@ static int gfx_v7_0_cp_compute_resume(struct
> >> amdgpu_device *adev)
> >>
> >> for (i = 0; i < adev->gfx.num_compute_rings; i++) {
> >> ring = &adev->gfx.compute_ring[i];
> >> - amdgpu_ring_test_helper(ring);
> >> + r = amdgpu_ring_test_helper(ring);
> >> + if (r)
> >> + return r;
> >> }
> >>
> >> return 0;
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> >> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> >> index 6c56ced..8dc8e90 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> >> @@ -4781,7 +4781,9 @@ static int gfx_v8_0_cp_test_all_rings(struct
> >> amdgpu_device *adev)
> >>
> >> for (i = 0; i < adev->gfx.num_compute_rings; i++) {
> >> ring = &adev->gfx.compute_ring[i];
> >> - amdgpu_ring_test_helper(ring);
> >> + r = amdgpu_ring_test_helper(ring);
> >> + if (r)
> >> + return r;
> >> }
> >>
> >> return 0;
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> >> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> >> index 09aa5f5..20937059 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> >> @@ -3846,7 +3846,9 @@ static int gfx_v9_0_cp_resume(struct
> >> amdgpu_device *adev)
> >>
> >> for (i = 0; i < adev->gfx.num_compute_rings; i++) {
> >> ring = &adev->gfx.compute_ring[i];
> >> - amdgpu_ring_test_helper(ring);
> >> + r = amdgpu_ring_test_helper(ring);
> >> + if (r)
> >> + return r;
> >> }
> >>
> >> gfx_v9_0_enable_gui_idle_interrupt(adev, true);
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the amd-gfx
mailing list