[PATCH Review 1/1] drm/amdgpu: Workaround to skip kiq ring test during ras gpu recovery

Yang, Stanley Stanley.Yang at amd.com
Wed Oct 18 06:00:41 UTC 2023


[AMD Official Use Only - General]

Thanks, will update.

Regards,
Stanley
> -----Original Message-----
> From: Zhou1, Tao <Tao.Zhou1 at amd.com>
> Sent: Wednesday, October 18, 2023 11:00 AM
> To: Yang, Stanley <Stanley.Yang at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Yang, Stanley <Stanley.Yang at amd.com>
> Subject: RE: [PATCH Review 1/1] drm/amdgpu: Workaround to skip kiq ring
> test during ras gpu recovery
>
> [AMD Official Use Only - General]
>
> > -----Original Message-----
> > From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
> > Stanley.Yang
> > Sent: Tuesday, October 17, 2023 10:37 PM
> > To: amd-gfx at lists.freedesktop.org
> > Cc: Yang, Stanley <Stanley.Yang at amd.com>
> > Subject: [PATCH Review 1/1] drm/amdgpu: Workaround to skip kiq ring
> > test during ras gpu recovery
> >
> > This is workaround, kiq ring test failed in suspend stage when do ras
> > recovery for gfx v9_4_3.
> >
> > Change-Id: I8de9900aa76706f59bc029d4e9e8438c6e1db8e0
> > Signed-off-by: Stanley.Yang <Stanley.Yang at amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 21
> +++++++++++++++++++++
> >  1 file changed, 21 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > index 9a158018ae16..902e60203809 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > @@ -29,6 +29,7 @@
> >  #include "amdgpu_rlc.h"
> >  #include "amdgpu_ras.h"
> >  #include "amdgpu_xcp.h"
> > +#include "amdgpu_xgmi.h"
> >
> >  /* delay 0.1 second to enable gfx off feature */
> >  #define GFX_OFF_DELAY_ENABLE         msecs_to_jiffies(100)
> > @@ -501,6 +502,9 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device
> > *adev, int xcc_id)  {
> >       struct amdgpu_kiq *kiq = &adev->gfx.kiq[xcc_id];
> >       struct amdgpu_ring *kiq_ring = &kiq->ring;
> > +     struct amdgpu_hive_info *hive;
> > +     struct amdgpu_ras *ras;
> > +     int hive_ras_recovery;
> >       int i, r = 0;
> >       int j;
> >
> > @@ -521,6 +525,23 @@ int amdgpu_gfx_disable_kcq(struct
> amdgpu_device
> > *adev, int xcc_id)
> >                                          RESET_QUEUES, 0, 0);
> >       }
> >
> > +     /**
> > +      * This is workaround: only skip kiq_ring test
> > +      * during ras recovery in suspend stage for gfx v9_4_3
> > +      */
> > +     hive = amdgpu_get_xgmi_hive(adev);
> > +     if (hive) {
> [Tao] the hive_ras_recovery should has default value if !hive.
> With that fixed, the patch is:
>
> Reviewed-by: Tao Zhou <tao.zhou1 at amd.com>
>
> > +             hive_ras_recovery = atomic_read(&hive->ras_recovery);
> > +             amdgpu_put_xgmi_hive(hive);
> > +     }
> > +
> > +     ras = amdgpu_ras_get_context(adev);
> > +     if ((amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3)) &&
> > +             ras && (atomic_read(&ras->in_recovery) ||
> > + hive_ras_recovery))
> > {
> > +             spin_unlock(&kiq->ring_lock);
> > +             return 0;
> > +     }
> > +
> >       if (kiq_ring->sched.ready && !adev->job_hang)
> >               r = amdgpu_ring_test_helper(kiq_ring);
> >       spin_unlock(&kiq->ring_lock);
> > --
> > 2.25.1
>



More information about the amd-gfx mailing list