[PATCH v3] drm/amd/amdgpu: set the default value of noretry to 1 for some dGPUs

Felix Kuehling felix.kuehling at amd.com
Mon Nov 30 17:35:26 UTC 2020


Like I stated elsewhere, I would recommend noretry=0 for Navi and later
GPUs because there is no performance advantage from disabling retry on
those GPUs.


Regards,
  Felix


Am 2020-11-30 um 12:22 p.m. schrieb Deucher, Alexander:
>
> [AMD Public Use]
>
>
> We need to figure out what the root cause is then.  If we can't figure
> it out soon, we should revert the change for navi1x and continue to
> debug it until we can find the root cause and we can safely re-enable it.
>
> Alex
> ------------------------------------------------------------------------
> *From:* Chen, Guchun <Guchun.Chen at amd.com>
> *Sent:* Sunday, November 29, 2020 2:22 AM
> *To:* Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>; Kuehling, Felix
> <Felix.Kuehling at amd.com>
> *Cc:* Gui, Jack <Jack.Gui at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>;
> amd-gfx mailing list <amd-gfx at lists.freedesktop.org>; Huang, Ray
> <Ray.Huang at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>;
> Zhang, Hawking <Hawking.Zhang at amd.com>
> *Subject:* RE: [PATCH v3] drm/amd/amdgpu: set the default value of
> noretry to 1 for some dGPUs
>  
> [AMD Public Use]
>
> Hi Bas Nieuwenhuizen,
>
> I don't think direct revert is one right approach, though it's able to
> fix your problem.  noretry=0 will cause other test failure on several
> ASICs.
>
> Regards,
> Guchun
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Bas
> Nieuwenhuizen
> Sent: Sunday, November 29, 2020 8:38 AM
> To: Kuehling, Felix <Felix.Kuehling at amd.com>
> Cc: Gui, Jack <Jack.Gui at amd.com>; Chen, Guchun <Guchun.Chen at amd.com>;
> Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx mailing list
> <amd-gfx at lists.freedesktop.org>; Huang, Ray <Ray.Huang at amd.com>;
> Deucher, Alexander <Alexander.Deucher at amd.com>; Zhang, Hawking
> <Hawking.Zhang at amd.com>
> Subject: Re: [PATCH v3] drm/amd/amdgpu: set the default value of
> noretry to 1 for some dGPUs
>
> Can we revert this patch to fix
> https://gitlab.freedesktop.org/drm/amd/-/issues/1374 ?
>
> On Thu, Oct 15, 2020 at 4:30 PM Felix Kuehling
> <felix.kuehling at amd.com> wrote:
> >
> > Am 2020-10-14 um 11:35 p.m. schrieb Chengming Gui:
> > > noretry = 0 cause some dGPU's kfd page fault tests fail, so set
> > > noretry to 1 for these special ASICs:
> > > vega20/navi10/navi14/ARCTURUS
> > >
> > > v2: merge raven and default case due to the same setting
> > > v3: remove ARCTURUS
> > >
> > > Signed-off-by: Chengming Gui <Jack.Gui at amd.com>
> > > Change-Id: I3be70f463a49b0cd5c56456431d6c2cb98b13872
> >
> > Acked-by: Felix Kuhling <Felix.Kuehling at amd.com>
> >
> >
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 23
> > > +++++++++++++++--------
> > >  1 file changed, 15 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > > index 36604d751d62..f26eb4e54b12 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > > @@ -425,20 +425,27 @@ void amdgpu_gmc_noretry_set(struct
> amdgpu_device *adev)
> > >       struct amdgpu_gmc *gmc = &adev->gmc;
> > >
> > >       switch (adev->asic_type) {
> > > -     case CHIP_RAVEN:
> > > -             /* Raven currently has issues with noretry
> > > -              * regardless of what we decide for other
> > > -              * asics, we should leave raven with
> > > -              * noretry = 0 until we root cause the
> > > -              * issues.
> > > +     case CHIP_VEGA20:
> > > +     case CHIP_NAVI10:
> > > +     case CHIP_NAVI14:
> > > +             /*
> > > +              * noretry = 0 will cause kfd page fault tests fail
> > > +              * for some ASICs, so set default to 1 for these ASICs.
> > >                */
> > >               if (amdgpu_noretry == -1)
> > > -                     gmc->noretry = 0;
> > > +                     gmc->noretry = 1;
> > >               else
> > >                       gmc->noretry = amdgpu_noretry;
> > >               break;
> > > +     case CHIP_RAVEN:
> > >       default:
> > > -             /* default this to 0 for now, but we may want
> > > +             /* Raven currently has issues with noretry
> > > +              * regardless of what we decide for other
> > > +              * asics, we should leave raven with
> > > +              * noretry = 0 until we root cause the
> > > +              * issues.
> > > +              *
> > > +              * default this to 0 for now, but we may want
> > >                * to change this in the future for certain
> > >                * GPUs as it can increase performance in
> > >                * certain cases.
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx at lists.freedesktop.org
> > https://list/ <https://list>
> > s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cgu
> > chun.chen%40amd.com%7C6d626e2a3bae4877024f08d893ff15db%7C3dd8961fe4884
> > e608e11a82d994e183d%7C0%7C0%7C637422071085800476%7CUnknown%7CTWFpbGZsb
> > 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> > 7C1000&sdata=VFqegGwPCj10q3Y5BdZsVq2a%2B4Tb358mYVDaNkA9zLU%3D&
> > reserved=0
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


More information about the amd-gfx mailing list